DA-100 Microsoft Power BI – Part 2 Level 9 – Other Get and Transform Topics for the DA-100 exam
- Resolve problems
Hello and welcome to this final level of part two. And in this part we’re going to have a look at certain aspects of the DA 100 exam that didn’t quite fit nicely into the other eight levels. Now we’re going to actually start with a few items that we’ve actually covered, but which we may not have covered, and said this is how you can do something. So what I’ve done is I’ve just imported our basic Pivot table HPI regions free. Now, there are some inconsistencies. For instance, we have numbers and we have two labels at the top. And what the computer did initially was to promote row one to a header. And know what we actually needed was to promote row two. So it could be that you have a column which is entirely full of integers, all except maybe for one or two rows. It could be that there’s a total row right at the bottom, for instance. So resolving inconsistencies is necessary to work out what on earth to do. So you can do that by looking, for instance, at the type of value.
So if you see that you have a couple of text values, then you might be considering what is it there? Why have a look at the data type. Is there a way of actually getting it into a better data type? And of course, you can have a look at the column distribution and you can have a look at the column profiles. So here you can see that it can actually be used, but when we get further down, we have a better version of the column distribution. Now, if we do have unexpected values, like for instance, we have a couple of zeros at the bottom, then you know how to deal with those. You can use, for instance, the replace values. So maybe I want to replace those zero values with null. Now make sure you don’t have null with a capital N. It’s got to be a lowercase nudeble L.
So that could be a way of dealing with such a problem. Now we have got two empty values which better aligns with what the data should be telling me. In a similar way, we can use the column profiles, column distribution, column quality, to have a look at whether there are data issue problems. For instance, let’s convert this into a decimal number and we can see lots of errors where we can detect that quite easily. If we go into the column quality, we have 94% errors. So it’s fairly obvious on the face of it that there are errors.
But you may remember previous examples where the errors are right far down in the data. And you’ll also notice the red line going across that tells us that there were errors even without seeing the column quality. And then we’ve also got some user unfriendly values that are possible. For instance, we had 1995 quarter one jan and we separated them out into three separate columns and then combined them into a single date. So we had January 1995, but it might be that you have got chords used. Like for instance, you might have a chord one success, two failure, three unknown. And you don’t want to have a one tune three, you actually want to have success, failure unknown. So this is when, for instance, you could use the replace values or you could use a custom column.
So if I’m going to add a column and custom column, we could use an if or we could use a conditional column or we could import in an additional table that had one success to failure free or none and then combine them. We can merge those queries together. So, lots of ways of applying user friendly value replacements. There’s another line in the DA 100 exam called choose an appropriate query type. Now, I’ve done plenty of research and have researched other providers of DA 100 videos and I think the general consensus is nobody has any real idea what this section is all about. Choose an appropriate query type while you create a new source and you choose what an appropriate source is and then you manipulate it. There’s no such thing really as an appropriate query type.
So in this video we just had a reminder about how to resolve inconsistencies unexpected or null values and data quality issues. Often the problem is actually detecting them rather than resolving them. So we’ve got a few ways of actually detecting them. And of course you can always export this into a visualization and detect the inconsistencies, unexpected or null values there and how to resolve them. Well, you can use replace values, you can use additional tables that could be quite useful for applying user friendly value replacements. And at the end, what you should have is a data set which is integral, which doesn’t have these inconsistencies unexpected values, which is the data that you exist respect. And that’s entirely what Get and Transform is all about.
- Configure data loading
In this video we’re going to have a look at configuring data loading and you can imagine data loading being right at the outset. So when we have a query, the first thing got to do is to load it from somewhere. However, the configuring data loading is not actually part of the Power query editor. Instead, it’s in file options and settings and options. So there are two separate sections data load, Global and Datalod the current file. And we’re going to take each in turn. First of all for the global. So this is all files rather than just one particular file. We’ve got something about unstructured sources. So these could be, for instance, from Azure cosmo DB. So we’ve got something that’s not a relational database, it could just be a bunch of documents, each of which could have a different schema as it’s called, different column type, different headers, so you can see what happens.
You can always detect column types and headers, detect it according to each file settings or never detect. We’ve also got an auto date time. So what’s this all about? Well, when you have a date, we’ve seen in part one of this course that you have a hierarchy automatically created. So we have a year, we have quarter, we have month and we have day. Auto date time can be quite useful because it allows for this hierarchy to be created. However, this hierarchy is only useful if you agree with its contents. For instance, we have a year. Now a year is a calendar year and it might be that you want a year to be a different form of year, academic year. It might start in April, might start in September. So this date time is useful, but only insofar as you find it useful. If you don’t want to use it, then d check it. Also make sure that your date time is reasonable. The actual value is using.
For example, there was one source data which had an unknown date type and rather than using Nulls, it used the year 1999. Well, this date time table had to go all the way to the year 19 nine nine and it just took a huge amount of memories. You can imagine literally hundreds of megabytes which had a performance impact. So not good. We also have cache settings which you probably won’t need to use. So whether the query preview results are stored on your hard drive and you can see we’re nowhere near the requirements, nowhere near needing to expand this, but if we ever did, this is where it is. Now, for your current file there are some additional settings.
So we start off with detecting column types and headers for unstructured sources so that’s inherited from the global, whether relationships are detected when it’s first loaded or when you are refreshing the data, and whether the computer should try and discover additional data types after the data has been loaded. Again, we’ve got the question of for this particular file, do you want a date time table? So that, again, is inherited from the global, but you can adjust it for an individual file, whether you want data to be downloaded in the background. So data preview could work if you have got a lot of tables, whether the table should be loaded all at once. So instead of it being sequential, I load table one, then I load table two, then I load table three. Whether you say, I want tables one, two and three loaded all at once, so that might improve performance. But then you’re asking the source, assuming it’s one source, to do three things at once, so it might degrade the performance or reliability of that server. And then whether you want questions and answers to be turned on. So these are the data loading configurations. And you’ve got two separate sections. One for all files as a default and then one for your current file.
- Identify query performance issues
In this video we’re going to talk about identifying query performance issues. Now, this is a table which has been loaded from SQL Server. It’s a very small table, only 290 rows. So this in itself is not going to have any performance issues. But sports, we’ve got a table with a million, a billion, a trillion rows perhaps. Well, let’s have a look at two basic processes that we want to encourage. The first one is query folding. You can see that whenever we’ve got view native query, that the computer has been able to apply all of the current steps and just join them together into one select statement. This is good. This means that we are asking the server, in this case SQL Server, but it could be any other type of server, just one question.
And the advantages of that is we’re not pulling in additional rows where we don’t need to, we’re not pulling in additional columns or fields where we don’t need to. We’re just getting what we want, a point somewhere in this step process where we’re no longer able to view the native query, it just stops happening, the query folding. So from that point on, that’s where you might be starting to have additional performance issues. Now it might be completely fine, it might be that you just got some trivial things afterwards and you don’t need it all to be query folded. Or it might be that there is a problem. Now, if you do have such a problem, where is it? What point? Well, this is when we can use a new section called Tools to help. And this is all about diagnostics, so finding out where things happen. So for instance, suppose I wanted to find out about this navigation step. Well, I could go into this step and diagnose this step and see how long things take.
Now my computer just crashed when I tried to do that, so I’ll do it again with this Excel version. So we looked at all of these various steps and the computer goes, okay, what is needed for this particular step? And you can see here we have got evaluation, we’re getting results, we’re compiling, we’re evaluating. And then if we go further across, we have a percentage, exclusive duration and duration time. So let’s have a look at these columns, plus the operation, plus the category by themselves. So there may be other columns you want, but we can see here that the major time it does take is used in the evaluate section. So start diagnostics or diagnostics that actually gives you all of the query steps, diagnostic options. So you can see that you’ve got a very few limited options at the moment, but they may be expanding as this gets more mainstream. This is introduced at the time of recording very recently and you’ve got diagnosed steps. So if you just want a simple step to be looked at, then that can do it. So however, you choose to do, whatever you choose to do, you can get some diagnostics and look at what steps might be causing these problems. So now I’ve done that, you can see that it’s done a much more detailed step diagnostics. Here we’ve just got one ID.
If I’m going to here we have got multiple IDs as I’ve been walking through all of these various steps. So if you are importing from a source that allows query folding, have a look and see where the query folding stops and otherwise have a look at the diagnostics and see what particular point in your loading and transforming is causing a problem, and then have a think outside the box. Maybe instead of having a series of steps that go up to a particular query folding moment and then some more afterwards, maybe you can write an SQL statement that does it all, or maybe you can simplify earlier sections. So this tells you why something is going slow and then it’s down to you to try to cure it.
- Apply AI Insights
In this video I’d like to talk about how to apply AI Insights and this is fairly recent at the time of recording, it was introduced in June 2020. So if you’ve got a version of Power Bi which is before that, then you might need to enable it. So go back to the Power Bi desktop and go to File Options and Settings options and go into preview features. You might see an AI Insights function browser and this is where you would go if there are any new features that you are interested in. Like for instance, question and answer support in Spanish. Now there are two sections. Main sections of the AI. Insight text analytics and a vision and Azure Machine learning.
Now. Text analytics and a vision. They both require power. Bi Premium. So that is the really expensive version starting at around $5,000 a month. Now, the features that are allowed enabled with this text analytics are first of all extracting key phrases. So there’s a key phrase extraction function. So you’ve got huge chunks of text and it looks at what it thinks the key phrases might be. Then there is sentiment scoring. So what the score sentiment function does it has a range of inputs and classifies them as zero which is negative all the way through to one which is positive.
So for instance, you might be using this for social media, for customer reviews, for discussion forums and see what the sentiment is, how happy people are with particular service or product. You can also use text analytics for the language detection so trying to work out what language a particular comment is in. As you can see, none of this is available to me and probably not to you unless you have access to a premium capability with an AI workload installed. So if you have, then you can start whatever process you want. There is a vision as part of the AI Insight. And what this does is it’s tagging images. So it’s got a database of around 2000 recognizable objects, beings, people or pets or animals, scenery and action. And so what the computer is doing is looking at the picture and going I think that is so let’s say you have a million pictures given to you.
How many of those are of people, how many of those are of animals, how many of those are scenery? Well, the computer is able to work this out. Now, at the time of recording, image tagging is supported in English, Spanish, Japanese, Portuguese and Chinese. Sentiment analysis is supported in English, German, Spanish and French. And then we’ve got a separate category azure Machine learning. So what is needed for this is a data scientist who has Azure Machine learning. So Microsoft Azure, this is a cloud and there are lots of different components of Microsoft Azure. One of those is the AI element. So you can get machine learning model pointing towards this Bi analyst. So you’ve got a different person who’s a data scientist. He does his model. You then use it in this Azure machine learning models. So what can you do with machine learning? Well, let’s say you have a 1000 10,000 days with the atmospheric conditions, so including the pressure and the wind speed and the amount of rain.
And then you give the computer some more data like for instance the pressure and the wind speed and say okay, given this and given what you have learnt, what’s the likelihood of rain going to be? Now you don’t really need to know more than that. They are available in the home section in AI Insights if you want to have a look at what the actual pages are. So here is a text analytics page so you can see detect language, extract key phrases and score sentiment that would be shown. So let’s say I wanted to score sentiment while I want to use a particular field and a particular language. And if we’re using the machine learning, then again we would be pointing to a particular machine learning model and saying okay, have a look at this, what did you make of it? So this is how you can apply AI insights. First of all, you need Power bi premium. And secondly you either need a separate AI workload for text analytics and vision or for Azure machine learning. You need somebody with a machine learning model that you have.
- End of Part 2
Well, congratulations. We are half the way through the course. You have now got to the end of part two and let’s just have a look at the requirements of the DA 100 exam. So, we’ve seen how to get data from different data sources. So we’ve looked at settings, we’ve looked at data sets, we’ve looked at performance issues and parameters getting advanced.
Some of these other sections we’ve looked at profiling the data so you can identify data anomalies and interrogate data statistics. And then we’ve had a look for the most part at cleaning, transforming and loading the data. So we’ve been looking at the M chord to transform column data types. Now, let’s just have a look at each one of these. Resolve inconsistencies, unexpected or null values and data quality issues. So you’ve seen how to do replacing values. We’ve seen how to cope with errors. Apply user friendly value replacements again.
So if you’ve got an unhelpful computer code, you’ve seen how you can either use the refinery place functionality or you can indeed use M language to do series of ifs. For instance, identify and create appropriate keys for joins. We’ve seen how we can merge queries together. We’ve transformed column data types. We’ve applied data shape transformations to various table structures. We’ve combined queries. You’ve seen how you can rename columns and queries so you can make them more user friendly. Advanced editor. We’ve had a look at that. So we can modify M code through there. We’ve looked at config data loading and how to resolve data import errors. So there was a lot in this part. And if you are going forward for the DA 100 exam, then do make sure you get to grips with the M language.