DA-100 Microsoft Power BI – Part 3 Level 4 – Text functions
- Text searching
Now, in this section, we’ll be looking at the various text functions and we’ll start with Find and Search. They’re almost exactly the same as in Excel, but there is one particular hitch that you need to be aware of. So Find will find a particular phrase within another phrase, or maybe a letter within a phrase by default, starting at at the first position. So right at the beginning, and if not, starting wherever you want it to start. Now, this is one sensitive base one M is base zero. So for instance good morning. Hi there. The letter G in M is at position number zero, in DAX is at position number one.
So if I do a find for G within there and I start a position number one, then it will give me number one. Now, the one is indeed optional, but I’ll tell you why it’s not really optional in just a few minutes. So if I change that to the letter H, we can see it says it’s a position number 15. But if I change the H, the capital H from upper to lower case, then it won’t find it in the word high. So if I change this to high and it will give me an error. Now, errors in data models are not a good idea and really you need to avoid them. So that’s why there is this a fourth argument value not found. And please can we have this in Excel? It’s really good if anyone from Microsoft is listening, because right now I have to say if error, then give me whatever is like a blank. But I don’t have to do that in Power Bi.
Instead, I just say find the text I in, whatever text it is, starting in position one. And if it’s not there, then give me something else like the number zero. Now, as I said, this won’t find high as a capital as a lowercase hi, because we have capital H I in here. But if you change this from the Find function to the search function, then it will because search is case insensitive. So it doesn’t matter whether it is lower or upper case. However, it is still accent sensitive, so it will treat an E with an accent differently from an E without an accent. So, what I had to do with the Find function before the search function was introduced in, I think 2003, is say, well, I want the upper of this text, convert that all into uppercase and then I’ll look for high in upper. I’ve got that the wrong way around, but I had to convert buff arguments into uppercase.
So let’s have a look at it in Power Bi. I’m in the subcategory table and what I want to do is look for the word bike in the English product subcategory name. So a new column, because I want this to be case insensitive. I don’t care if it’s a lowercase bike or not. A case bike, I’m going to use the search function. So I’m saying bike location equals search, and I’m searching within the English product subcategory name. And that’s the second argument. What I’m looking for, which I need to put first, is bike. So I’m searching for bike within the English product subcategory name. Now, there are hard brackets around start position and not found value. So because there are hard brackets, the computer is saying they are optional. So there we go. Press Enter and it’s going to give me the location of all of the word bikes.
Except it doesn’t. It gives me an error throughout. So what’s going on? Well, it says the search text could not be found. Okay, well, let’s have a look. Row one, yet there’s bike raw two. Yes. Says bike rule three. There it is, row four. Ah. The computer goes, okay, I can’t find the word bike in here. What do I do? Well, I will look at the search function for the not found value. There isn’t a not found value, therefore I’m going to raise an error now, because there’s an error in one row, this permeates the entirety of the column, even those rows where it has been found.
So, whilst the hard brackets indicate that this fourth argument is optional, it isn’t. Every time specify a start position and every time specify a not found value could be zero or it could be blank, or it could be something else. Because if there’s just one wrong raw, you might find the entire column becomes an error. So now we’ve got the right answer. We have got bikes starting at position number ten, six, nine, and everything else apart from a couple down here, are blank. So that’s how to use search and find. Well, search, I think, is a bit more useful because it’s case insensitive. Now, what if I wanted to change all of the bikes to more PEDs? Well, I can do that using the substitute function.
So let’s have a look at the substitute function. It replaces existing text with new text in a text string, and there are four arguments. The text, in other words, what might be altered and then what is going to be altered and what is going to replace it. And then an instance number. So let’s just have a look at this substitute function. So I am going to substitute in the English product so category name the word bike. Notice this is lowercase and I’m going to replace it with the word more PED. And you notice the instance number is optional, so let’s do that and refined. Absolutely nothing happened because substitute is case sensitive. So we need a capital B and a capital M. And now we’ve got our mountain mopeds road morpheds and touring morpeds.
Now, if you wanted to capture both bike and morped, so bike as a capital B and a lowercase B, then what we need to have is two substitutes one looking for the capital and one looking for the lower case. Now, there is a fourth argument which is optional. It says, okay, I want you to find the first instance, the second instance, the fourth instance, and only replace that. If you don’t put in an instance number, then it’s not going to just do the first instance, it’s going to do all of the times that the word bike appears in that particular string, which might be what you want and is generally for me, what you want. So these are the text substitution and searching functions. So just remember, search is case insensitive and find is case sensitive. But for both these functions, I strongly recommend using.
- Text extraction and substitution
Text extraction can be done using some very familiar functions. Left, mid and right. They work exactly as they do in Excel. And as I’ve said in the previous video, DAX is base one. So the first character is character number one instead of M, which the first character is number zero. So probably not that much of an interview deduction that’s needed. Left takes the left two characters.
For instance, mid takes from character position X for as many characters as you want and right takes the right and Moss characters. So let’s take an example. In Power Bi we’ve got in the fact internet sales this sales order number which starts with S or. Now suppose I wanted not the sor, but I wanted to extract the number which is the last five characters. Maybe I want them to convert it into a number and so forth. So I can do that using write. So my extracted text is at the right of the sales order number and it’s the right and most five characters. So there’s the extracted text. But I could say, well, I want from the third position for five characters. So that would be using the mid function.
So I want from the third character for five characters gives me exactly the same result in this particular instance because I know that there are seven characters throughout. Now, suppose I didn’t know that there were seven characters throughout and I wanted the first three digits. So maybe five or nine has this particular departmental meaning. So what I could do is say, well, I want from the third position for three characters perfectly possible. But suppose the S or that we got at the beginning needn’t just be two characters. Maybe it could be one, maybe it could be five. So what I’m trying to get is from the third character to the fifth character, but it could be a variable length. Suppose that I had elsewhere this and I still wanted these three characters.
Now, what we know about these characters is that they are five characters away from the end. So here we have the length being seven and we want to go back five. So we can get the length of the entirety, which is seven, subtract five, but that won’t get us to where we want to be. We want to be not in the second position but the third. And so we have to add one to that. And same here, the length is eight. We want to get to the fourth position and so we go back five, but we still have to add one to get four. So let’s see that in action. So I want to take the length of the sales order deduct five, but then add one. Now, quite often I do have the plus one separately because this five might be a variable number as well. It’s the plus one that’s really important. And then take three characters. So that is how it works with a variable number of characters and I just wanted to go from the end rather than from the beginning. Now, suppose I wanted to replace the Sor with say, sales order. Well, there are many ways of doing that.
So the first way is to take from a particular position. So it could be from the third position. If we know that that’s when the characters end, when the letters end and take five characters, but we might not know that it’s five characters, we might want a huge number, and we could say 999 characters, which is fine as long as we know that this sales order number is not going to be greater than 1001 characters. So the nine nine is some number way beyond what we needed. Now, we could do it scientifically and say give me the sales order number length and minus two from that. So that is another way of doing it. More accurate way perhaps, but quite frankly I just prefer using nine nine nine. It’s a lot simpler than using that. And then from there we can add sales order right to the beginning and join them together using an ampersand. So that’s one way of doing it.
Another way while we had a look in the previous video about substitute so I could substitute in sales order number, get rid of so and give me sales order. So that is another way of doing it. But what if instead of saying well, I want to replace this very specific text and it could be a not specific text, I want instead to replace the first two characters with sales order and that’s when the replace comes in. So this uses the same start number of characters as a lot of these other functions do, but then replaces it with new text. So the difference between replace and substitute is substitute. We know what text we are trying to replace in the replace function, we know where it is. So that’s why I’ve put it into the extract functions.
So I’m going to now replace fact internet sales. I’m going to replace from the first character onwards for two characters. I’m going to get rid of those and substitute or replace it with sales order. And that’s just yet one more alternative way of doing things. So, just to give a further example, suppose I had the text hi, my name is something, what’s yours? Well, I could do that by saying I want the first 20 characters and then say Philip and then continue from character number 22 for as many characters as I want. Alternatively, I could substitute the hash with Philip or Alter. I can replace what’s there at the 21st character for one character with the word Philip.
- Text conversion
Now, we have previously done implicit conversions. For example, suppose I had the string product key and I joined that the field product key, which is a whole number.
So product key is a number. The string product key is text and the computer then combines the two together. It converts the number into a string so it can use the ampersand to concatenate them. Now, that’s an implicit conversion. It’s the computer deciding how to convert one field type into another. Now, this is still a number. And just to prove its number, I’m just going to add a thousand to it. And you can see it operates as a number before it gets converted. So that’s implicit conversions, I have relatively little say as to how it gets converted. Now, for explicit conversions, if you need more control, there are three functions.
The first one is fixed. This rounds a number to a certain number of decimal places and says, do you want commas? And this third argument is a bit confusing because it’s actually asking do you want no commas? So yes, I want no commas, or no, I don’t want no commas. So let’s just see how this works. So if I have the word fixed, the function fixed, I want to convert it to zero decimal places and do I want no commas? It is true that I want no commas. So that gives me a bit more control over the conversion.
Alternatively, I could say false. I don’t want no commas. In other words, I do want commas. And so this will give me one comma, five to eight. Obviously, if you are in a non English locale where you use a dot as a thousand separator, that’s what you’ll see. Value goes the other way. It converts a number that is looking like a number but is actually text into a number. For example, let’s take this fixed result, which has a comma in it. I can change it into a number by putting a value around it. And now you can see that the data type was text. It’s now a decimal number. But the function which has the most amount of convertibility is the format function. Now, you can format numbers given a certain string and they’re an awful lot of formats and they’re broken down into predefined and customed.
So let’s have a look at the predefined first. So we have the specifications of general number, currency, fixed, standard, percent, scientific, yes, no true, false. On, off. Doesn’t this sound rather familiar? It sounds a bit like the formatting that we’ve got here. Decimal numbers, whole numbers, percentage, true, false. So what this function is trying to do is literally convert into plain English. And if you are in a nonenglish local, I’m afraid it is English. It doesn’t accept French or German, for instance. However, if you’re a bit of an excel guru, then you’ll probably be used to these sorts of numeric formats. And these formats are available in DAX. So zero displays a digit or zero hash displays a digit or nothing.
And so forth and it’s even got the section dividers up to four different section dividers. And this is what you can use in Excel. So let’s take the number -45 I can format this number custom.
So I can say if it’s positive, they give me this sort of formatting. If it’s negative, give me this sort of formatting and so on. So you can see it’s a negative. So it goes to the second of these sections, which are separated by semicolons. Now, I’m using red just to actually highlight it. Colons don’t actually work that well in Power Bi. So I would just stick to what you’ve got here, all of the various symbols. Just remember, if you. So choose, you can have a different formatting for positive values, for negative values, for zeros, and for null values. Now, you’ve also got predefined formatting for date and time. And to show you why this is important, let’s just have a column called Ship Date.
And it has Ship date is and the Ship Date column. So this is my ship date string. And you can see I’ve got no control over how the date is displayed here. It’s displayed in American short date format. However, having a look at the various predefined formats, we have general, long and medium date, short date, long, medium and short time. So if I was to go back in here and format this as a long date, you can see it changes from the short date to Wednesday, August 8, 2007.
Of course, if you’re an Excel Guru, then you’ll know you can use custom dates as well. So we can have DS for days or dates, w for weeks, M for months, y for years, HS, for hours, N all M for minutes, and S for seconds. Now, why I say M all N is because the computer is now intelligent enough to know that if you use an M immediately following a heat, then you mean minutes rather than months. It used to be that you had to use a capital M, that’s for months. It’s no longer the case. You can use lower case as well. The computer is intelligent enough to work out what you mean. So let’s say I wanted this in British format as opposed to the American format.
So date first, and then month, and then year. Notice I’m just using one of each, and you can see what we’ve got. 24 8220. So this Y by itself is not really working. Let’s have a look and see what a single Y means. It says display the day of the year. So we need at least two YS to get the year. So eight, eight or seven. Notice what happens if I add an extra character to each. We now have 080807. And if I keep adding, we now have a different form of day and month expression. Wed OG or seven. So let’s add the day in as well, the date. So wed eight, OG or seven. And then let’s put lots of things into four characters. And now we have Wednesday, 8 August 2007. So hopefully you can see how powerful the format function is for converting numbers and dates into strings. And if you’re familiar with the predefined or custom numeric formats, then you’ve got the hyperlinks within the spreadsheet to have a read.
- Other functions
A quick look at all of the remaining text functions. While blank is classified as a text function, it can also be used in numeric columns. It returns a blank, so it’s the equivalent of a null. In other languages, chord returns the ASCII value. So for instance, a capital A would return. Usually a chord of 65. Concatenate is an equivalent of using the ampersand and a lot more on wheeldes you can see. But there’s also concatenate X, which goes through an entire table or part of a table and joins everything together using a delimiter exact sees if two strings are the same, ignoring formatting differences. Lower and upper. Notice that there isn’t a proper function like there is in some languages. Converts text either entirely to lowercase or upper case. So there is no function for capitalizing the first letter of each word REPT that repeats text a lot of times.
Now, Trim is actually out of all of these functions, with the possible exception of blank, the one that I use the most, it removes excess spaces from the beginning of a string, excess spaces at the end of a string. So if you get your data as a 20 character string, regardless of how many characters there are used in it, it removes any spaces from the end. So if I had hello with 15 spaces at the end, after using Trim, it would just have five characters. And it also removes spaces from in between words apart from single spaces. So if you had a double space or triple space, it replaces those with one. Now, other languages use function such as L Trim and R Trim. You don’t have this in DAX, but for me, the single trim function is a lot more powerful. So these are the remaining text functions.