DA-100 Microsoft Power BI – Part 4 Section 5 – Manage datasets Part 2
- Configure a Dataset Scheduled Refresh
In the previous video, we installed a gateway connection which will allow Power Bi service to connect to my computer. In this video, we’ll have a look at how we can refresh the data automatically. Now, your data source cannot be refreshed because your credentials are invalid. Okay, let’s edit our credentials. So we can either use Windows without impersonation, or we can use other categories that it gives me. So I’m going to click no Privacy setting for this click sign. And there we go. So next we have to say, when do we want it to be refreshed? So here’s our schedule refresh, and you can see that the schedule refresh keep your data up to date is currently off.
So I’ll switch it on. So we can say we want this to happen every day, or we can say we want it to happen specific days of the week. It says weekly, but that’s not quite true. Won’t do it just one day a week. This is when you can say, well, I want it to happen on Tuesdays, Thursdays and Saturdays, so I want this daily. And let’s say I want this to happen at 12:30 p. m. . Now, it’s usually best, of course, to have a time when a the data will have changed, and b when it’s not too busy on your network, perhaps. So I’m just going to say 02:30 A. m. , and let’s pretend that the data was updated at 01:00 A. m. . Each day takes about an hour, so I’m going to also do it at 03:00 in the morning. Now, how many of these can I actually do? Well, I can do eight refreshers a day for power.
Bi pro. If you’re on Power Bi Premium, then you can do 48 times a day, that’s one every half hour. And you notice I was only able to select the hour or the half hour. Now, because I said wanted to start at 1230, it doesn’t mean that it would necessarily start at 1230 because I’m not the only person to say, can Power Bi service do this at 1230? There might be some capacity issues, so it could take up to an hour to start. Now, it doesn’t want very long, drawn out database refreshes. So if your data set takes longer than 2 hours to refresh, or 5 hours if you’re on Power Bi Premium, then you might want to consider incremental refresh settings and we’ll have a look at that in another video. Now, what happens if it doesn’t work? Well, the refresh will be paused if there is a failure four times in a row.
So fails once, twice, three times, and then the fourth time, and you can’t change that number four, that will happen. You can’t say I want it to stop if there’s eight time failures in a row, or if it’s pointless if there is no reports that uses this data set that have been viewed for the last two months. So you may have a report. But if nobody’s using it, why should Power Bi service use all of its compute capacity on something that doesn’t matter? So what happens if the refresh fails? Well, there are two options which you can see here in Power Bi service. I would strongly suggest checking send refresh failure notifications to the database owner and putting additional people just in case the database owner happens to be on holiday.
So let’s click apply. Now, it should be noted that this is for data refreshing. It is not for model refreshing. So if I go into my model and I change the model, say I got rid of this last stage that appended the query onto itself and thereby double the amount of sales volume, then that’s a change of the model that won’t be updated, it’s when the data changes. So the data which comes from the source here, so this is from an Excel workbook, for instance. So let’s go into that workbook and change the data. So here is that workbook, and we’re just using the sales amount. And I’m going to change this very slightly from $3,578 to one with a huge number of notes.
So that is 100 billion. So let’s save that data. And so when the computer gets the data again through the gateway, then the data will be refreshed on the Power Bi service. Why not do this now? So let’s go back into the Power Bi service. Let’s just make sure that it has actually kept all of this autorefresh. So I’ll go into my workspace and into settings. Lots of ways you can get into it. So there we can see it’s still going to happen. And you can see the next refresh is going to happen at 1230. Well, I want to manually refresh it. So let’s go into our data set and I’m going to click on refresh. So currently we’re using it in things like this dashboard, while the amount is 58 million. We’re expecting this to go by 100 million if I’ve gone my last correct. So let’s go back into the data set and I’m going to refresh it now. I can click on this refresh button as many times as I want.
There is a limit to scheduled refreshes that will only happen eight times a day. But I can manually refresh as often as I wish. So I’m going to click refresh. Now, notice we’ve got columns for refreshed and next refreshed. And it doesn’t take long because there’s only four Excel data sources that it’s going to grab. So you can see it’s finished now. It might just take a few minutes for it to work through the system. It’s not necessarily instantaneous, so I’m just going to pause the video and just wait a few minutes. Right. Having done that, we’ll go into our dashboard. Remember, this was saying 58 million, or it’s now saying 200 billion. So it doubled the amount afterwards.
But you can see it has made a serious impact on the data. So what I’m going to do next is open up the data again, go into this sales amount, get rid of a couple of zeros, and now I’m not going to do anything. I’m going to let the computer auto refresh, because it’s going to have a scheduled refresh at 1230. So I’m going to pause the video and come back without having done anything. Well, it’s taken Power Bi quite a long time to refresh, but it was meant to refresh at 1230. It’s now 02:00. However, you can see it is finally refreshed. So the sales amount has gone down from 200 billion to 2 billion. So in this video, we have had a look at scheduling refreshers. We’ve also seen how to do it manually, and we’ve also seen the limitations, how many times you can do it, and who gets notified if it fails.
- Configure Incremental Refresh Settings – Step 1
Now generally in this course, we’ve been working with fairly small data sets. However, you can have potentially huge data sets which are gigabytes terabytes, and they may not fit in your Power Bi desktop model. So how are you actually going to do that? How are you actually going to create a model that you can then upload into the Power Bi service? Well, you can do this with something called incremental refresh. So what this means is you don’t necessarily give Power Bi service the entirety of the data. You give it a small section and then when it refreshes, it will expand that section to all of the relevant data. And then when it refreshes again, it will just get the latest data. So instead of having to download gigabytes into your Power Bi desktop, you’re downloading Megabytes, then you’re uploading Megabytes, then you’re uploading all of the data gigabytes.
But then on each refresh, you’re just uploading the most recent data. So megabytes. So this could be things like, for instance, the Internet of Things. You might have a reading from a sensor every 6 seconds and you’ve got 100,000 sensors where you can imagine that that is going to be huge very quickly. So how can we configure incremental refresh? Well, first of all, you might want to know that this is a relatively new addition to Power Bi Pro, at least at the time I’m recording this. It was a preview for quite a while on Power Bi Premium, but it became mainstreamed, what’s called general availability in Power Bi Premium in February 2020. And at the same time, it got added into Power Bi Pro. So you’ll understand from this that if you’re using the free version, you won’t have access to the incremental refresh. So it is a two stage process. The first stage is that we need to reduce the amount of data that we’ve got in these tables. And we can do this by editing the tables.
Now, you either do this right when you’ve loaded the data initially. The advantage of doing this if you’ve got, say, an SQL Server data set or something like that, data source, is that you can take advantage of native query folding. So the query folding that you can do, you can have a look at the native query and see if query folding is part of that. And what that means, if you remember back to part two, is that it might take a lot of steps and put them into just one query for the end database to get. So instead of having to download all the data and then filter it, it just retrieves a smaller section so it’s quicker to get. And then the second advantage is you can manipulate it in Power Bi Desktop, actually build your model, build your visualizations without having a huge amount of data. So I’m going to put a new step in here, but before I do that, I need two parameters.
So I’m going to go to home, manage parameters, new parameters, and these parameters have to have the exact name I’m going to give you. It is range start and the other one is called range end. So English words if you’re using a foreign language, it is range start, null spaces. These are what’s called reserved parameter names. And the type has to be date time. And what the computer then wants is a suggested start. So what is your data range that you are going to personally be working with? So I’m going to say it is going to be the 1 January 2006 and I’m going to have another one, range end.
Again, it’s got to be date type, date time type, and I’m just going to give it a figure. So this is just my initial values. And if you want, you can say 23, 59, 59, etc. You can see the computer’s automatically inserted the time when I haven’t put it in. So these are your parameters which will appear here on the left hand side. Don’t worry if you get these dates wrong, there are no wrong values. First of all, theoretically, if you’re say putting in 2032 your dates which simply aren’t in your data set, that could be the wrong value, right? So now I want to partition this table down. So what I’m going to be using is my order date. And again, don’t worry if you get the wrong field.
This is just for your initial data upload and then you’ll have all of the rest of the data. But it would be helpful if you choose something which is going to be in chronological order. So if I get a new value, it’s not going to insert itself somewhere in the middle, it’s going to be right at the end or very close to the end of the existing data. So what I need to do is now filter this down so it’s going to go right at the end because before that we haven’t got the right type.
And this again needs to be a date type. At the moment it’s a date. I’m going to change that to a date type. So now let’s add a photo. So I’m going to use a custom filter and this custom photo is going to say where the date is after or equal to the parameter range start and is before the parameter range end. Now please note one of these has to be or equal to. Otherwise you’re going to have gaps. And if you have both of those have be equal to, then you’re going to have duplicated data. You don’t want that. So I would stick with is after or equal to and is before, right? So that reduces the amount of data that we’ve got. So that’s the first step, close and apply. And it’s going to update my query. Needless to say, I’m not using something that is terabytes in size. So it’s going to work very quickly. So that’s step one. Initial.
- Configure Incremental Refresh Settings – Step 2
Step two is to define a policy. So I’m going to go into incremental refresh and this new dialog box comes up. So right at the top, I don’t know that this data set, this data source that you have got, whether it can be query folded, if you can’t, it’s not recommended to use incremental refresh. Why is that? Well, let’s say I’ve got a spreadsheet with billion rows. Not possible in real life, I think, but let’s say I’ve got one with a billion rows and I can’t use query folding. So to do a filter, I’ve got to load the billion rows and then filter it down so the local machine has to do that work. Instead of saying, okay, database just gives me this particular partition that I’m looking for, so that’s fine. So, yeah, okay, I’m not using a query folded, I’m going to ignore that. This is just an example. I could use SQL Server and get it from there, but I’m well aware not everybody has SQL Server and will not be able to follow along. So I’m going to do an incremental refresh. Now, notice this other thing.
Once you’ve deployed this table to the Power Bi service, you won’t be able to download it back. Why is that? Well, Power Bi, Microsoft anticipating that you’re going to have billions, trillions of roles and it doesn’t particularly want you to download a file so big you can’t actually use it on your desktop computer. So I want the number of rows to be in this date range. So I don’t necessarily want everything. I might have data going back 30 years, but I only want the last ten years. Well, I can say I want only the last 15 years to be stored on the Power Bi service. Now, when I do a refresh, I want you to or an incremental refresh, I want you to just refresh the last number of, say, five days, so I’m not refreshing a lot. And we learnt in the previous video that you can refresh up to eight times a day for Power Bi Pro, 48 times a day for Power Bi premium.
So having say, five days, ten days, that’s actually going to work quite nicely. You can restrict it further by saying, I only want you to update in the last five days if this due date column or order date column or ship date column, this particular column, the maximum value has changed. In other words, it’s not like nothing has happened. So the last order was at twelve midnight on one particular day, and I come to do an incremental refresh and it’s still twelve midnight on that particular day. There have been no new orders, I think, therefore don’t refresh the data. So that is another way of reducing the amount of data that’s quite transferred. If there’s nothing new, don’t bother. But it looks at the maximum value of a particular date time column, as in is something new and then finally, you can choose only to refresh complete days. So let’s say it’s 01:00 p. m on a Tuesday.
Well, it won’t download anything from Tuesday if you check this box, because Tuesday is not yet complete. So it will do Monday, Sunday, Saturday and Friday the last five days. And it won’t do Thursday from 01:00 P. m.
Onwards because that is not a complete day. So let’s apply all. And now I’m going to save this. So remember, you can’t redownload it. So I’ll save it as a new Power Bi model and then I’m going to publish it. And that’s all there is. There we go. I was wondering when that dialog box is going to come up. So, incremental refresh, there are two aspects. First of all, you need a smaller section of the data so you can work with it in Power Bi desktop. And then once you’ve done that, using the parameters range start and range end. And by the way, you will need those parameters because if you try to do incremental refresh on a table that hasn’t got those parameters, you can’t do it.
Once you’ve done that, then you go into incremental refresh and you set the date range for what you want to store and the date range for what you want to refresh. So now let’s see if it works. So, I’m going to add in a bar chart, which is going to have the order date, that’s the one that I’ve used as a filter. And the sales amount there’s my values. And I’m just going to drill down. And you can see we have successfully got 2006, January, 2006, December. So the filter works. So I’m going to resave this and I’m going to republish this. So, yes, I do want this report to be updated. So that success. Let’s bring in our Power Bi service window and let’s have a look at how reports should be identical to what we’ve previously got, january to December.
So now let’s see if we can refresh it. We’re going to settings. I want to use the data gateway, personal gateway that we set up previously, yet that’s all set up fine. So now let’s refresh. So, preparing for refresh. So at the moment we have a reduced data set. It should be getting the totality of the data now of the last 15 years anyway. So it’s still working through it, as you can see. So this is when it’s now downloading everything, and not just the 2006 that I had previously filtered upon. Right, so it’s finished. So let’s go into our workspace. Refresh. And now you can see it has got everything in from 2005 through to 2008. This big figure, obviously is because I have manipulated the data set a few times. But the important thing here is that it has successfully grabbed all of the data.
And if I was to put in more data in the last five days and then refresh, it would then get just that very latest data. So in this video, we’ve had a look at incremental refresh. We found out how to set it up. We can publish it with the policies on the Power Bi desktop, and then we can refresh the data to get all of the data that wasn’t included in the initial data upload. And then, if you so wish, we can then go into settings, go into schedule, refresh refresh, and keep the data up to date, but only by downloading uploading as little data as is needed.