DA-100 Microsoft Power BI – Part 2 Level 1 – Get Data – Home
- Sort and Filter
Now it’s possible that you don’t have the data in the order that you want. Especially if you’re going to reduce rows by removing the top few rows of the bottom few rows. So maybe you didn’t want the very first ones where the foul name starts with the very start of the alphabet 000102. Or maybe you wanted this columnist number, a column to be sorted in the other order, descending order. And then you can remove the rows that you don’t want. Well, you can do that with the sort. And there are two sorts of sorts, ascending and descending. Fairly obvious. So notice I’ve highlighted a step, but I actually want it as an end step. So I’ll highlight the last step and click on Z to A. So that sorted the rows by order descending. Alternatively, I could sort them by ascending. So I can use the sorts here or I can use the sort over here with these little arrows.
Now, while I’m talking about sorting, I can also talk about filtering. Now filtering, we’ve had a look at removing certain rows. That’s what filtering does, but in a more versatile way. Suppose I didn’t want any of the files which have got M four B. Or maybe I didn’t want any of those which are WMA. I can just go down to this drop down list and deselect it just like I can in Excel. So here you can see it has floated the number of rows and now we’re down to a much smaller amount of rows, 190 it looks like. Now, unlike Excel, I can’t then easily unfilter the rows it may look like I’m able to do so. So here I’m just filtering and getting 190 rows and then I can unfilter.
But that’s only because I’m still on this same filtered rows step. So using tables, select rows. If I go to something else, like for instance, I saw descending and now go back into this filtered rows, our WMA option is no longer there. We’ve moved beyond that. We have done something else. So whilst you can unfilter immediately in the sense of correcting your current filter, you can’t then go back in the later step and unfilter what you have filtered. So notice also how this is displayed in M table. Select rows, the previous step and then each. So for each row where the file extension is not equal to WMA. So if I was just to highlight M four B, then it would say where the file extension is equal to M four B. So this allows you to do a little bit maybe of formula manipulation.
You might want to say where number is between 102 hundred. So if I just get rid of this, these photo drawers and these sorted drawers, let’s say I wanted a number filter between 102 hundred. Okay, that’s fine, we can do that. And you’ll also notice there’s an advanced option where you could say is greater than 100 and is less than 100 and where another column is equal to something else and or all. So it’s quite a versatile filter but notice how this is being displayed where the number is greater than or equal to 100 and where it’s less than or equal to 200 so you could extend it and where is not equal to 101. I’ve got an end bracket at the end that shouldn’t be there. So let’s just get rid of that. There we go. So we can use a basic filter and then develop it further for Solo in the M functions. So it gives the same sort of functionality that’s here. But we’re not just stuck to equals greater than less than. We can use other M formulas that you might be learning about either in this part of the course or by reading the manual or something like that.
A quick look at other types of filters. For strings you’ve got begins with, ends with and contains or the opposite. And for dates you’ve got particular ranges, most of which are probably more useful if you’re looking at current data. So when we get to the date functions in Transform, which we’ll be using a data cells which uses current date data and so we will be able to use these filters. So sorting and filtering just like what you can do in Excel, but bear in mind that once you’ve filtered and you’ve gone on to another step, you’ve can’t then on filter.
- Split Column
Now sometimes the data that you get is not exactly in the format you might want it. For example, let’s have a look at this folder path and let’s move it to the end. So let’s just drag it all the way across. Now this folder path has got all of of the path that a particular CD is on. So the vDrive Music album, a decade of excellence and so forth. But what happens if I don’t want them all to be in one big column? What if I wanted them to be in lots of separate columns? So I’m just going to open this back up so we can see there’s lots of information there. Well, I can split it, but how should I split it? Well, I might want to split it based on a delimiter. So for instance, every time we have this slash, we want everything else to go into a new column. And we can do this by using the transform split column. So I can split by delimiter. So I will split by the slash at each instance of this slash.
So I could just say I want the left and mouse one, or I want the right and mouse one. But here I want every single one. I’m going to split itself into columns and you can see it’s going to be generating five columns. So let’s click OK and see what happens. So now you can see folder path has been split into folder path dot one, dot two, dot three, dot four and five. So what use can I make of this? Well, suppose I wanted a report to be based on folder path four, which contains the CD name. So let’s rename this CD name, I just double clicked on it, click close and apply. And once it’s applied I can go and have okay, let’s have a let’s just do a table just to start with CD name and the number of bytes, say, so we got size. So here we can see all of the CD names with the size, which we can obviously easily change into a line chart or something like that.
So you can see Christmas eight 2007 right at the top. So how else could we do this? Well, aggregating size. Suppose we were splitting at say, a space, maybe even just the first space or the last space. How would that be useful? Well, suppose you got somebody’s name Philip Space Burton. Now that’s fine in one field, but you might actually want to have a full name or four names and surname, or maybe full name, middle name, surname. So you would be splitting up based on the delimiter of a space. You’ll notice that we’re using the formulas table split column and with all of these, as I say, you can let the computer do the hard work in working out which formula it is and then if you want to edit it later, fine tune it, then you can do. So what are the split columns can we do?
Well, we can split by a number of characters. So in this case, suppose you didn’t have a text file that was split at various delimiters, it was split at various points, or the 15th character in it was split and then the 20th character. Well, if they’re regular, you can split by the number of characters. So 15th characters say so if I do that to fold a path five, you can see we now have the first 15 characters and then all the rest. Alternatively, maybe you have one which is 15 characters, and then you’ve got five characters, and then you’ve got another ten characters, which is perfectly possible as well. Well, you can do the by positions. So 15 2025, and there we have it split into these locations. However, also notice what didn’t happen. It hasn’t given us the first 14 characters. We have here twelve Concerto for Strings, Il Riposo, that has been changed to three. Five characters of space. Ings no.
Apostrophe another five characters and Il riposo. So if you want the first set of characters as well to appear, then you need one comma 15 2025. In fact, you need zero, because it is zero based. So you can see we are missing the very first character there. So if I change that to a zero, then we are getting the very first character, and then 15 characters later we get in the next few and so forth. So this split columns allows us to fine tune where we split. There are a few more instances we have got split where it changes from lowercase to uppercase, from uppercase to lowercase, from a digit to a non digit, or non digit to a digit. So when might these be useful?
Well, suppose you have got a product chord, which is all numbers, and then the product itself all merged into just one column. So 349-2648 stapler, one two, six glass, they’re not necessarily of the same length and they might not have a space in exactly where you as a delimiter. So maybe you want to split it so that when it goes from digits to non digit, then you make a new column. Bit rarer out of thought. So the main two out of thought would be by delimiter and by positions, which gets it into fragments. So by delimiter when you’ve got things like spaces or backslashes or anything that can be used to identify where the columns stop and end, and then by positions where you can say, well, the first one is going to be 15 characters, the next one’s going to be five, the next one’s going to be five. But do remember you need to start at the very first character being zero. So if you wanted the beginning 15 characters, then you would start with zero comma 15.
- Other Transform activities
In this video we’re going to have a look at the remainder of the transform section of the home tab. So Group By, suppose we have this CD name that we have created by splitting this path down and we wanted to see how many titles there were for each particular CD name. Now you might be saying to me well actually what you should be doing is something things similar to this actually give it as an end visualization. So let’s put say open in which is always going to be there and need to apply the changes. I’ve dragged that into access it should be in value. And so we can see here is the results that we want. And actually I would generally tend to agree with you this sort of summarization that the group by offers generally should be done as the final step. What we’re trying to do here in the Power Query editor is get our data into a format where we can use it well.
So what we’re going to do in a few videos time is have a look at more practical example of this group by. For instance I might say, okay, out of the number of bytes which are available in CD name, what percentage is occupied by each individual one. So for instance, for all of these twelve items we have a number of bites. What percentage is that over the entirety of this particular CD and that might be a useful thing that we can do at a line item, a raw level. So let’s just have a look at this in isolation and in a future video we will be able to combine this with other functions that we’re looking at. So I’m going to group by the CD name. So let’s just click on the Group By and it says Group by CD name. Obviously this is where you can change it here.
And what do I want to do? Well, I want the total number of bytes. So there’s my size. So I’ll call this total bytes and click OK. And what happens is that our table is now replaced with this group table. So remember, just like filtering, I can’t go back a step and still have this. So I can delete this existing step, I can go back to previous steps to see what happened. But our end data now is this grouped data. Now let’s just have a look at the dialog box. This is basic, we do have an advanced version because it might be that you want additional groupings. So do it by type of music, by CD name for instance. So maybe you want all the classicals to be grouped together and within that the CD names and maybe you want additional aggregations. So you can see what aggregations we’ve got sum average median min max count rows, count distinct rows. That means that it removes in the count duplicates. So the count of Aaabb is five, the count distinct is two because there’s only A and B. And then finally there’s count the number of rows that there are.
Notice there’s this little dot that appears. You can see what it does. You can move things down and up when you’ve got multiple columns so that’s the grouped rows data type. So this is saying, well, a particular column is decimal whole number percentage, something to do with date and time text, true or false, also called boolean and some binary data which we generally don’t need to use in this version of the data that we have.
Now, the important thing about this is if you have the wrong data type to a particular column, then the computer will be interpreting it wrongly. So later on we’re going to be looking at an example where the computer is going to import date data from a text file, but it’s going to be in the wrong format and it’s going to interpret it wrongly. And so what we need to do is instead of having the computer auto convert it into a date, we need to convert it into a text. And then we can use date functions to get it to interpret correctly. So usually what the computer decides is fairly good. But we saw much earlier on how when we had the wrong number of zero top rows removed, that it was getting bad data types. If I just show you it again from the very beginning and open up the CDs query and allow the computer to do its thing from the very beginning, you will see that when we get to this number, it’s actually giving it as a whole number. So this one, two, three and text.
And I don’t really know that that’s the best solution for the computer. If we go into the transform, you can see that the number column, we can’t use any of these functions in the number column, but we can use all of these in the text. So while it’s saying ABC, one, two, three is a number and a text field, it is really treating it as text. And if you look at the M language for this change type, it is saying type any. In other words, it doesn’t know what the type is and it’s just saying it could be one of many things to give you a clue. Now, let’s go back to our real query and you can see that the computer has intelligently put particular types, it’s given this number, this column as a whole number type. And if I go to transform it, you can see that the whole array of number column functions is available. It said that this is text over here. It said that this is a date time field. And because of that, very little is available in the text column. There’s some but not others.
But more importantly, the date and time column functions are available. So if the computer has got the wrong date time. You should correct it at the earliest opportunity, even if that means deleting some steps. Now, replace values that as you can see, replaces existing values with new values. So suppose we go back to our folder path and we’ve got V colon. Maybe we’ve got some others there as well. Well, suppose it’s no longer my V drive. Suppose it’s my you drive. Well, I can replace the values with find v colon and replace it with U colon so you can match the entire cell contents. And if you wish, you can also use special characters just like you can do in the advanced version in Microsoft Word. So if I do this, you’ll see that all of the V colons have been replaced by U colons.
This could be quite useful if, for instance, we had some CDs on one computer, some CDs on another computer and it was getting information from two different places and we wanted to unite it into the one set of data with the one set of real data. So you colons, for instance, we wanted to map it to the exact same place. Now, one final thing about data types make sure that you have a valid data type. Suppose I try to change this size from a number to date time. The computer is not going to have a really good time of doing this and you can see lots of errors. Sometimes these errors might be good. The computer is saying I can’t do it and maybe these are data that isn’t really what you want. Maybe you have got a huge report which is divided up into sections with headings and so forth and you just wanted to keep the numbers.
Well, what you could do is remove the rows that contain errors in this particular column and then you would have the right set of data. So errors aren’t always bad, it’s just indicating that there was a problem with probably a conversion earlier on. So in this video we had to look at group by how to replace your existing table with an analysis. Again, not generally recommended in the Power Query Editor though we will be seeing an instance of how we can use it later on. We also saw how you can replace data types and how you can replace values.
- Practice Activity Number 9
Hello, how was this practice activity? So this continued from the previous practice activity where we had a few problems. All of these were being shown as text and the dates were also being shown as text because we had the first month and then the 10th month it wasn’t working out properly. So first of all we need to go back into Power query. So the quickest way to do that is to click on whatever the query you want to read it and then click on Edit query. So first of all, in the date we have got this as an ABC one two three, that’s a general and you can see data type any. Now remember, in the last practice activity the computer automatically changes the date but there was a problem so let’s scroll down and see if we can find the problem. So we need to scroll all the way to the bottom and we have got grand total. The words grand total. We’ve also got a lot of nulls and all of these nulls aren’t hoping anything. So I’ve asked you to filter out anything which is not a date.
So I’m going to filter out the null because a null technically you could say it’s a date, but in this case not really. And the grand total, if you had the nulls in, that’s fine, no big problem. We really wanted to get rid of the grand total. Now, what is the current data type being used? And I asked you to change the data type of all of the fields which you can see are the data type any so that they then become whole numbers. So I’m going to select them all by clicking on the first one and then going to the last holding down shift and clicking on it and going to whole number. So that will solve one of the problems that we had in Power bi, why was treating this as text? Next, I ask you to split the date into month, day and year fields. Now, this is a text field which has got a slash in, so I’m going to split the column by delimiter and the delimiter is the slash. So it’s not being treated as a date specifically, it’s being treated as text. So I’ll split it into each occurrence of that delimiter. So now we have got the month, we’ve got the day and we’ve got the year.
So I’m going to rename those with month. There’s a date, there’s the year and I’m going to remove the day column which is this one. Next I said Oops, all of those references to 1995 on error are in error. They’re meant to read 2017, they’re not really, but for the example of this practice activity so we go to replace values and we change 1995 to 2017 and click OK. And finally, let sum the Greater Manchester values by year so we can just see if there are any problems. So I’m going to group by. So I’m grouping by year. So I’ll click this from Changes from Greater Manchester to year and I want this to be the Greater Manchester sum. That’s just the name of the column, I could pull anything I want.
And this is going to be the sum of Greater Manchester and click OK, so that just allows me to see visually if there’s any problems. So now let’s remove this final step and we close and apply, we update our visualization and one thing that happens is that there is something wrong with one or more of these fields and we no longer have the date field. So I’m going to get rid of the date field and instead I’m going to put the hierarchy of year and then month, and then I’m going to change all of these counts to sums. So now we’ve got something that looks much closer to a proper visualization. And because we have year and then month, we can do things like drill down or expand down one level and all the rest of it. We might also want to change the sorting and other things, but there we have our visualization. So in this practice activity, we went back to a previous practice activity and we error checked some things. We changed the type to whole number.
We changed one field which had a grand total, we got rid of those, we split it out into month, day and year and we replaced values. So hopefully now you are getting a much better handle on the home tab of the Get and Transform and you’ll be able to use all of these functions with some confidence. We will be coming back to these functions later on as we continue through the course. All of these sections are building up on the previous sections, so thank you very much for watching this. And in the next section we’ll be getting multiple files.