DA-100 Microsoft Power BI – Part 2 Level 8 – Get other types of data
- Introduction to SQL Server
Hello. And in this section, what we’re going to do, concluding our look at the Power query section of Power Bi is connecting to various different data sources. Now there are obviously a huge number of data sources and we can’t possibly go through them all. So what we’re going to do is look at the requirements of the official All Microsoft exam and so what we’ve got is connect to files and folders.
Well, we did that previously when we had a look at January and February and then merged in Mays and Marches and April, details we’ve been importing from Excel for most of this course. So that leaves us with databases and then SQL, Azure, Big Data and SSAs. So we’re going to start off with connecting to a database. Now for this video and the next video, I am not expecting that you actually have SQL Server loaded onto your computer and it’s probably a waste of time to be honest, because it takes a bit of a while to load it on computer, then you’ve got to load the database and so forth. So this is a bit more of a follow on what’s happening on screen rather than actually doing it.
So SQL Server, we can create a database in SQL Server which holds lots of tables. So as you can imagine, it is analogous to a big workbook, Excel workbook, containing lots of spreadsheets. So before I connect, there is something you might need to know and that is the authentication how to prove that it’s you. Otherwise anybody could just connect to your precious data. And there are two main ways of connecting to SQL Server windows authentication.
So you need your Windows username and your Windows password. And that is good, that’s a general way of doing that where possible because the computer itself has had to log you into Windows and go through lots of checks to ensure that your password is still current. If you’re in a corporate environment, it meets complexity requirements, complexity requirements and all the rest of it.
SQL Server that requires you to have a login and a password and is generally thought to be a little less useful than the Windows authentication. For example, I have to have to be connected to the network or have a computer that has been connected to the network to be able to connect to a corporate network. With SQL Server authentication I could be anywhere on the Internet, on any computer and just connect using a login and password. So I’m going to connect to this database.
And the database could have a really complex name, could have lots of things with a backslash, but in this particular case, my database or my server is just going to be called Dot and that indicates it’s the local database connected to my computer. So I’ll connect and there we have an entry into SQL Server. Within that there are lots of databases.
So these are the workbooks as you will. And within that there are the individual tables. So I’m going to go into Adventure Works 2014, and of course just because I’m in the front door doesn’t mean I’ve actually got rights to use any of these particular databases. It just so happens I’m doing this. But authentication, seeing if you can get into an SQL Server database, that really is an issue for your database administrator. So you can see all of the tables that are connected or that are part of this database. And I’m going to concentrate on three.
So production, dot, product. So just so you know, this last space after the dot is called the table name, the first page is called the schema name. So in the Schema production I’m looking at the product table, the product subcategory table and the product category table. So what I could do in SQL Server is write a little piece of code to get everything from production product. And this bit is relevant to what we’re doing in Power Bi. I’m not trying to teach you how to use SQL Server.
So there are two aspects here, select and the star of the asterisk means all the fields from and then the name. So here we have lots and lots of columns and it could be a huge number of roles. In this particular case it’s only 504 rows. But just imagine a production database, you could have millions, tens of millions, hundreds of millions, even billions of rows. And if you didn’t want all of those columns, then you would specify which columns you wanted.
So for example, maybe I wanted name and maybe I wanted to call that name, that column product name. Maybe I wanted the size. And then if I run that you can see we have got a much smaller version. And then maybe I wanted product subcategory ID. And generally of course, the less information you get from the database, the quicker it’s going to be in, the less dramatic is going to be on SQL Server.
- Importing database data into Power BI, and Query Folding
Let’s now go back into Power Bi and go into SQL Server database. So I’ll connect using my connector. So the server name is Dot. If you have a complicated server name then you can and you’ve got access to SSMS SQL Server Management studio. You can go to Connect Database Engine and whatever that complicated thing is just highlight it and CTRL C to copy. So you could have copied all of that database optional but I’m going to connect to the Adventure Works 2014 and again you could right and click on the database, go to Rename and copy the name. Now we’ve got two different types of data connectivity mode import and Direct Query. Import is the one that you are more familiar with. So the tables and columns are imported into Power Bi desktop. It’s going to take a snapshot of it. So if the data changes you will have to refresh your snapshot.
Direct Query on the other hand, there is no data that’s imported or copied into Power Bi. It gets the metadata, it gets the tables and columns but just the names. And then when you actually run the query that’s when it asks the database for the query. Now there are advanced options as well. If you have a particular SQL statement to run, you can run it. If you have got access to an SQL Server that’s got failover support. So in other words we have a database which is connected to a second database so that if the first database crashes we can instantly use a second database. So there is no drop time in connectivity but you’d be told whether you’ve got that. However, this is more important include relationship columns. So remember that currently it’s checked.
We can also navigate using full hierarchy. I don’t recommend that I’ll show you the difference between the two. So leave all of this as is and click OK. As long as we’ve got the this include relationships column checked. So then a dialog box comes up asking okay, what table are you interested in? Now you might think well I’m interested in this table product and this table product cat to green, so forth.
If you do that you’ll get two separate queries. So if we’re trying to combine these product, product subcategory and product category and one we don’t want to select three separate queries, we just select one table. Let’s transform the data. So we go back into a Power query and here we have the source is a database and the navigation is the particular table. And you can see we’ve got in this case all of the data but it could just be the top 1000 rows and we’ve got the data with all of the columns. Now what I’m going to do is do just a little something which is looking at the native query. So if I right and click on whatever stage I’m on and view native query you can see that we have a select statement with all of these columns and then from the table, so we have table size as size. Now suppose I said I’m just really interested in the name, so right and click remove other columns.
Now let’s have a look at the native query. You can see the native query has really been reduced. So what’s happening is the computer going into the database retrieving all of this information and then the next step discounting all of the columns apart from one. The answer is no. That would be a huge waste of the computer’s time. Instead what it’s doing is something called query folding and that means it’s going to go through the steps and produce in computing terms the simplest query that it can use to get the data. So it’s not going to get all of this data if it has to.
At the moment it has got all the data but it’s only got the data for the preview for the first 1000 rows. If this was a database with a million rows then it hasn’t loaded all a million rows for all of the columns and it’s not going to because when I click close and apply it knows that all it’s got to do is load one column equally. If I then say, well I only want to keep the top 50 rows, then the query reflects this select top 50 name from this table.
So the computer will simplify the query to ensure that the minimum amount of data that it needs to do what you want is transfer. So that’s query folding and it’s done automatically. There are some things that will stop it from doing it but those are a bit rare. Mean for instance, doing something that is not easily supported in SQL Server language.
- Select a storage mode
Now I just want to have a further look at this data connectivity mod. So said in the previous video, import is for downloading all of the data at the beginning and Direct Query is for getting data only when it needs to. Now, in the DA 100 certification exam, it doesn’t call this data connectivity mode, instead it calls it a storage mode. So you can think about these as storage. So, do you want all of your data to be stored on your computer? Now let’s just have a think about why you might not. First of all, you would have to get all of the data as a snapshot, as I said in the previous video. Secondly, could that be a bit of a security problem if you have all of your data stored on your computer and maybe, let’s say you are connecting to this SQL Server database, maybe the security changes on the SQL Server, which means you no longer should have the right to access it.
Well, you’ve already got it, it’s already imported on your computer, so there’s nothing that the end DBA on the SQL Server, the database administrator would be able to do about this. But with Direct Query, this means that the security is always up to date because you are always getting the later security requirements. Additionally, your data is up to date. So let’s suppose it’s 09:00 in the morning and I grab all of the data. I would have the data as of 09:00 in the morning and then I would have to refresh to get the latest data. With direct query. I’ve always got the latest data. So whether it’s 09:00, 10:00, 11:00 now, the disadvantage of that is that I am constantly having to go back to the source. However, the advantage is that I’m not downloading all of the data.
Suppose I’m doing a select statement and manage to do use query folding and it’s actually quite a small contained data set. Well, that would work for the import as well. But maybe my Direct Query, my visualizations, I’m getting an even smaller set. So maybe I’m doing filters, maybe I’m doing something else. Direct Query just takes the data that you need when you need it. But there may be a bit of a lag because you having to ask SQL Server for it.
It really is a bit of a question. Which is better, getting everything at the beginning, saving local data so that’s the import one. You can also use things like Q and A and Quick insight because you’ve got the data, but you would have to do all of your refreshing manually. Or do you want to just get the data when you need it with the up to date data, with the up to date security. So you got these two different mods import and Direct Query. Now, for some data there is a third mod and that is called dual du Al or Composite mod. So what that means is, okay, computer, I want you to choose so some data can be directly imported, other data must be directly queried.
Now, just to let you know, you can change your data connectivity mode. So I’ve just loaded a table at random and if I hover over it, you can see the storage mode. Now, if I go to the model and click on the table, you’ll see that over here we’ve got properties. And if you go down to Advanced, you can see the storage mod. So whether it’s Direct, Query, Dual or Import. Now you’ll notice that Direct Query and Dual are grayed out. I have actually imported this table. If you go from Direct, Query or Dual to Import, it’s a one way route. You cannot change it from import back to direct query or dual. So if you get the data initially using Direct, Query or Dual, you can change it to Import. If, however, you’ve got it as an import, if your storage mod is import, you cannot change it back.
And the only thing you can do therefore is to delete the query and start again. So Storage Mod, the import mod allows you to create a copy of your data sets on your local computer. And it is the default Direct Query is for when you don’t want to save local copies of your data. Maybe for security reasons, maybe because you’ll find that it’s quicker if you’re using a huge data source and it always ensures that you have the most up to date data and that all security requirements from the source are met. And then there’s Dual or Composite Mode, which allows some data to be directly imported and of a data to be queried. So it’s once foot in both data connectivity mods or storage mods.
- Expanding multiple tables in SQL Server
What I’m going to do is say actually what I want is the name, I want the category, the subcategory in the category. So let’s move across and say I want the category and subcategory. Remember we’ve only imported one table, subcategory is in another table there is a reference to subcategory so let’s see if we can find it. There we are product Subcategory ID so there’s lots of nulls so I’m just going to take out the nulls so we can actually see some data. However 14, 21, 31 doesn’t actually tell us what the actual subcategory is. Let’s move along and you can see that we have got links to related tables. You remember when we looked at the source one of the advanced options was include relationship columns so the computer knows that one particular table is connected to another table using primary keys or foreign keys. It’s basically set up well. So what we’re going to do is go back to our data and we’re going to scroll across all the way to production product subcategory.
Now you can see that each of these particular cells contain multiple items including a table within a table, essentially a table within a list. So what I’m going to do is expand production product category subcategory and have all of these items. I’ll decide later what I actually want to keep. So let’s click OK and so we can now see that subcategory 14 is rolled frames, 31 is helmets, 21 is jerseys and so on. Now I also want the category which is related to the subcategory but it’s a different table and again we have a link to it. We also have a link back to product because the subcategory stuff that we’ve just got is related to the product but also the products related to the subcategory.
So it’s a bit of a loop if we go down that route but what I want to go is go further down the chain and get the product category so I’ll expand that as well. So now we can see that product category two is components for accessories and free clothing. Now we’ve currently got 53 columns an awful lot. We don’t want that many columns and if we have a look at the dean native query it’s fairly complex, it’s telling us all of the columns that we’ve got.
So what we’re going to do is say well I want the name, I want the list price. So I’m holding down control to select these individually. I’m scrolling across I want the subcategory name and I want the category name so right and click and remove other columns. So if we now have a look at the native query you might imagine it’s going to be a lot simpler.
Actually no it’s boding and boding and boding but in computing terms this is actually a lot simpler. Yes we have got references to all of these columns that we don’t want but SQL Server is intelligent enough to go actually these aren’t part of the final output so I’m not going to bother processing them. So you can let SQL Server take the strain of deciding what’s important. Now notice we’ve got production product subcategory name and production product subcontractory production product category name very long. So whenever you’re dealing with nested tables it’s probably good towards the end to just rename what you’ve got.
So let’s make a visualization based on this we will close and apply I’ll make a graph, a stacked column graph and within that we will drag the list price to the value, the category to the axis and the subcategory also to the axis so we can drill down. So we have got data from SQL Server and we can see that we can manipulate it as any other sort of data.
However, the simpler the data the better. So the fewer columns you need the better because the computer is then not going to have to retrieve all of that data. So the process that the computer uses is called query folding. I’m just going to remove this filtered rows because I don’t actually need it now so I know the query could break it’s not actually going to. And here is our final it looks hideous SQL query. It could be reformatted quite easily but this is actually fine in computing terms you can make it a lot more elegant but the end result will just operate just as quick. Now I said I would let you see the difference between an SQL Server where we are enabling full hierarchy.
So just to show you that, as I say, I don’t recommend that you do this, we then go through the various schemas like person like production and then we get into the various tables. I personally find it a lot easier to just go through it in order of whatever orders it gives. It seem to give a bit of a random order when we looked at it but this is an alternate way of looking at things. But as I said before, just select one table and as long as you have got include relationship columns you can use that one table to drill down to other tables as appropriate and you can import the data which is the default. Or you can use direct query, which will always ensure you have the most recent data. But, of course, it takes longer to actually get the most recent data. It will be refreshing quite often.
- Importing data from SQL Server Analysis Services (SSAS)
Now in this video, we’re going to be looking at importing into Power Bi from SSAs, which is SQL Server Analysis Services. The difference between SSAs and SSMS is that for a standard relational database, it is more geared to getting data data in. So getting data out is a bit slower. Wavessms a cube, this is the technical name, is created of pre aggregated data. So what happens is you identify when creating the cube. These are the important things and let add up things that relate to it. So aggregation are your counts, your sums, your averages and so forth.
So it is the designer of the cube that restricts the huge amount of data that’s available within the original database and says this is a set of data which is much quicker to query, which has all of the things that you really need. So what I’ve done in Visual Studio 2019, I’ve just created a quick cube based on the Adventure Works. So there are two sets of things dimensions and measures. Measures are things that you can literally measure. So there are things that you can sum, that you can average, that you can min, you can max. Dimensions are things that can’t necessarily be measured. So there might be things you can count, but that’s about it.
So I’ve got a dimension here for subcontractory. So I’ve got subcontract categories name. I cannot add together two of ten subcontract categories name and come up with something that actually makes sense. Similarly, I’ve got product category. ID. So two, three, four. Yes, they’re numbers, but in reality I don’t add them together because there is no category ID nine for instance. It just relates to a particular category. So what I have done here in Visual Studio is created a cube full of analyses that have already been created. So what I can do is I can drag out, say the subcategory name and I can go into measures and get the total list price. So I’ll execute the query.
So what I’m going to do is put the subcategory name into the list price. It’s currently in a photo. So here we have the subcategory’s name. Then I’m going to add the category name and so on. So it is already processed analyses that the computer can create other analyses from. So let’s see how you can import that into Power Bi. So get data and analysis services. So you need to connect to a particular server. So again, I’m going to connect to Dot, and you’ve got import or connect live. So Connect Live is the equivalent of direct query for analysis services.
You can also write your own query. If you do, be careful. It’s not using TSQL, it’s not using any sort of SQL language. It’s using MDX or DAX. And we’ll be using DAX in the next part of this course. But it depends what sort of cube you’ve got. There are two versions of SS, so let’s connect and you can see that I have my Cube AdventureWorks 2014 which contains the following dimensions and measures. So click OK and notice that we have all of these measures and products in categories, exactly as we’ve got in Visual Studio. So here we have all of the measures and here we have all of the dimensions. Also notice I can’t get data. So once you have imported something from SS, you can’t combine that with other pieces of data. It is the one data set and edit queries.
You can just go in and edit which database you’re looking at. So once you have imported the data, then again, creating the information is fairly straightforward, creating your visualizations. So get what visualization you’re talking about. And again, I’m going into product measures. That’s something you measure and put in the value. And I’m going into category and get in the category names and the subcategory names and then I can drill down just like it did in the previous example. So SSAs is pre aggregated precalculated analyses that the computer can then build up into your own analysis. So it’s generally a lot faster and a lot easier to use in its native form. But when you get to something like Power Bi, it’s the same user interface. So once you’ve imported the data, then you just drag and drop as before. Now, some of the more advanced features within SSAs might not be available in Power Bi. For instance, named Sets and Actions, they might not be available. But the majority of the stuff that you’ll be using on a day to day basis, you will be able to use in Power Bi. So that’s how import data from SSAs.
- Setting up Azure SQL Database
Over the next two videos, we’ll be connecting to Azure Azure SQL database. Now. What is azure? Well, Azure is a managed cloud database. So it is Service, which is this final S, but it’s also Software, which is his first S software as a service. Now it is something that is not free, but you could get a free introduction to it. It is SQL on the web and you can see how Microsoft is marketing this. For instance, easier to maintain because it’s maintained on the internet. Microsoft can give it the latest packages and so forth. So in this video, let’s get an SQL SEO database. So if you’re not interested in actually how you actually get an SQL Azure database, then skip the next video while we’ll be showing you how to use it. So I’m going to start now with a free credit.
This one says this offer might not be available, but notice one of the things you can do for free for twelve months is have an SQL database up to 250GB. And that is quite a lot. So I’m going to start for free. I had to reload it in a different tab. It just came up with a blank page, so I’ll type in my details. So this is my Microsoft account that I’ve already got set up. So now I’ve done it, it’s loading up the window. And then after quite a long wait, you can see that it’s asking you to put in your personal information. This is the identity verification by phone information. And you can see it either phones you or text you. And it just texts me a six digit verification code. Then it’s asking you for your credit card information. You won’t be charged unless you upgrade. So it’s just to verify your identity and you can see the various information it’s asking for you.
And then you’ve got a subscription agreement. Lots of legalese, but I always suggest working through those just to have a look, see what the free account credits can be used for and the privacy statement. So lots to read, but I do suggest reading it, just having a look at what’s there. So once you’ve done that, check it. You may wish to receive information, tips and offers from Microsoft or select partners. Well, I’m going to say yes, just in case there’ll be something useful. And click sign up. Now there was a lot of waiting, lots of these wheels going around, these bubbles going around. So it was setting up the account and then it says, welcome to Azure.
So we can then create an Azure SQL database in the Azure Portal. So let’s create an SQL database in the Azure portal. And it does have this quick start, but let’s see this in action. So first of all, we need to sign into the Azure Portal. So there was a link in that window to do so. But here it is. And you can see there’s a tour. You can create, you can go on, I’ll say maybe later. So let’s create an SQL database and you can see that there is additional training if you want as well.
So let’s create a database. So it’s part of my free trial subscription. And I haven’t got a results group, so I need to create a new resource group. So I’m going to call this first resource group. So that’s just a container that holds everything that it needs. Now let’s create a database itself. So we’re going to call this first database and a server. So let’s create a server. So create new server. So server name. So my first server. Now, bear in mind it’s got to be all lower case. If it’s in capital letters, then it won’t work. Server login. So the administration. So I’m just going to call it as your admin login and a password.
So you can see it needs to be a complex password. And the location, well, that’s down to where you are. I’m going to say UK West, except I find that I’m not allowed to with my free subscription. So let’s go for us. East. And you can see the specified server name is already in use. So I need to just add a few letters to the end of it. And I’m going to put 2022. There we go. So click OK, I’ll save that information. So do I want to use SQL elastic pool? The answer is no. The compute and storage, it’s probably all okay. So I’m just going to leave it at that. So there we go, instantly.
What’s in there might be a nice clue. How much is this going to cost a month? So if I go down to a basic thing, you can see it’s costing about $5 a month, so not that much at all. So I’m going to go for the basic and then I’m going to scroll all the way to top and click on additional settings. And I’m going to use sample data for this and it’s adventure works light. That is the standard sample data and I believe the rest of it will all be fine. So let’s review and create.
So you can have a look at all of the settings. So let’s create and the computer is now creating your database. And there it is. Your deployment is underway. So nothing more to do. And we’ll just pause the video until it’s actually finished. And in about three minutes the deployment deployment is complete. And the next step go to the resource. So here we are in my first database, which has already got some data in. And in the next video we’ll have a look at how we can use this data in power bi.
- Using Azure SQL Database in Power BI
Now in the previous video, which you may have skipped, we created an Azure SQL database and we clicked on Go to Resource which opened up our database. So what do we do from here? Well, you’ll notice that there is a quick start which gives you a few ways to connect. A fairly quick start is to go to the Power Bi preview. So if I click on that, you can create a report from there or it downloads something that enables you to create a report. So if you click on Create a report and it gives you a small file which you can click on, and it opens up Power Bi and fills in part of the information. Now there are two traps coming up and we just need to guide you through both of them to be able to use. This is your SQL connection. So this is what you get.
And in fact, the file you downloaded is a very simple file. All it really says is the name of your database. Please don’t try to use this database. You won’t be able to connect because the username and password you don’t know, or at least you don’t know the password. Now the thing is, this is the first trap about Azure connection to Azure SQL Server is that it gives you this Windows credentials. Please use the alternative credentials. Well, we know what my credentials are. My credentials are username this and password, which I will just type in. So these are the correct credentials. However, we cannot authenticate with the credentials provided.
That’s because we’re trying to use Windows credentials, they don’t work on an Azure account at the moment, no doubt Microsoft is trying to make it work. Instead we have to go into the Database tab. So that’s the first trap that you don’t just enter it into the Windows tab, you enter it into the Database tab. So let’s do that again. The second trap is what happens when you press Connect. One of the errors that you might get is that you cannot get through the firewall. And that’s because the Azure firewall may not be set up to allow your particular IP address in. Now the good news is this is very simple to correct if you’ve got access to the Azure database or if you know someone who has. So we go to overview at the top Set Server Firewall and you can see that there are no firewall rules configured.
So what we need to do is add a client IP and there’s my IP and you will also notice that that address is exactly the same as is on this error message. So as I say, very easy to set up. So we’ll just save this and it may take up to five minutes according to this, but you saw how quick it actually was. So let’s retry that. So now that we have successfully connected to the Azure SQL Database, the rest of it is as per connecting to an on premise database. So let’s say I want the product as I did previously. Then I will click Transform data because my connection is using direct query. Then it downloads the data when it needs. And this is the perfect time for you to manipulate the data so that it can send an SQL query of however many columns, however many rows it needs.
After you have done all of your manipulation, like we did a few videos ago, so you could say I want the top 50 rows, for instance, and what will happen is the computer will query fold it into a more efficient SQL statement. So connecting to Azure Database is very easy once you get past those two initial traps. And once you’ve done that, then you can create whatever visualizations you would like to in exactly the same way as you’ve done before. So this is, I think, one of the really good things about connecting to various sources. Once you’ve done that, it all looks the same way. Now, just to point out that if you do get data through the normal route, so get Data SQL Database, then you will have a slightly different box. But this looks like the box that we had previously.
So we need when we’re looking at SQL Server on premises. So I’ll connect to my server. We use direct query. We can use any of these additional items, choose not to click OK. Then it may ask for username and password. It doesn’t in this case because it already knows it, because I’ve used it before. And then you can connect. So just a slightly different dialog box, the more standard dialog box. If you do, go get data. Azure Azure SQL Database So this is how in the past two videos we have seen how to create an Azure Database and we’ve seen how to connect to Azure Database and create visualizations from it.
- Using Big Data
Now in this video, I just want to briefly talk about big data. What is big data? Well, Excel, if you open up Excel, you will see that there are a million and 4000 rows. So really big data is things greater than that, but 2 million, okay, that is big data, but we’re generally talking in the billions of rows or maybe even more. But you can see here we’re talking, I think that’s the one beyond terabytes of data. And obviously with this sort of information, the data, we don’t download it first because that would be a huge amount of data to do. So now, what data platforms are there? Well, Azure of course is the one that Microsoft favors and so you should know Hadoop and Spark or at least you should be familiar with them. And there’s also an analytics team called HD Insight.
But these are the main players. So you’ve got Microsoft Azure, Cloudaria, Scissors and Tableau is a number visualization product. And Oracle, they do lots of databases. MongoDB is another version of databases. So these are the main platforms just to be aware of. It says to no, I think that’s a bit of ancient statement because you’ll probably be using one or maybe at a push two sources. Now, if you want to see a more sophisticated version than what I’m going to show you, if you want to see something using perhaps HD Insight, then have a look for Azure big data. And you can see here the first thing after this ad is big data and analytics. And there’s a webinar that you can watch if you want to see more different types of big data. So they will obviously being a Microsoft product, they will look at the Azure Hadoop and HD Insight and so forth.
Now, I’m going to take a very limited huge data set and there is a website called Catalog Data Gov. So this is a US website with huge amounts of data. And what I’m going to do is look for the NYC taxi zones. So there’s one further down called the 2018 Green Taxi Trip data. So that includes fares, details of pickoff and drop off points and so forth. Now I previously downloaded it because it is a huge download as you might expect from big data and it’s 812 megabytes. So we’re going to do is we’re going to load that into power bi. So let’s just have a quick look at how you load other stuff and it’s basically find it. So you’ll see here various Azure’s there’s data sets from Data World.
That’s an example of big data. And if it’s not here, then it will probably be either in development or you could download additional certifiers later on perhaps. So what I’m going to do and here is the Azure. So here we’ve got Azure HD Insight which we looked at earlier. So what I’m going to do is I’m going to load this particular file the CSV file. Now, notice, of course, it’s just going to be a demo of it at the beginning. It’s going to be a sample set. So I’m going to transform the data and this is generally what you then do. So, for instance, suppose I wanted to know the amount of distance, the amount of fare. It’s interesting the amount of information you can get. So here’s the total amount per day. Well, what we can do is use everything that we have previously looked at.
So suppose I wanted just to extract the date from that. No problem. Go to date only, so that strips out the time. Now I just want this plus the fare. So there was a total amount over here. So I’m going to remove the other columns and then I’m going to group by. So I want to group by the data was picked up and I want to sum the total amount. So total fare. I’ll call this. So you can see the computer is reading all of the data now because it has to. So it’s on at this point that it doesn’t do the sample first thousand rows, it now has to read all of the data.
Now, if this was a different connection, if this was a connection to an online database, then you’ll probably be sending through an SQL statement which will be processed at the database end. So you get a small set of data. So here, now we can see going to rename this as date and the total fare. So going to close and apply. And again, you just use exactly the same techniques that you’ve had for visualizing everything else. So once the data is in the system, everything that you’ve learned so far you can just use.
So eventually the data gets loaded. So I’m just going to create a column chart, put in the date hierarchy into the axis, total bear into the value, and we can drill down, drill down, drill down, so we can see what the most popular day is for fares. Perhaps this is the most popular date, so we can see it’s in the middle of the month. Alternatively, we can drill through.
- Use the Common Data Service (CDS)
In this video, I’d like to talk about another connector. And this connector is the Dataverse. So just like all the others, we go to getdata more dot, dot, dot. And halfway down on the left hand side, we’ve got Power Platform. Now, the Dataverse is the new name from November 2020 onwards of the Common Data service. So if you see lots of references to to CDs, that is now called the Dataforce. So what is the Data force? Well, it’s a database on the cloud. More specifically, power. Bi is part of the power platform. This is a series of services that Microsoft has put together.
So we’ve got Power Bi for business analytics, we’ve got Power Automate, which automates processes or flaws, and Power virtual Agents, which allow for chat bots, automated messaging, and it uses artificial intelligence to understand your answers. But in addition, there is something called Power Apps. Now, Power Apps aren’t the basis of this particular certification. That is more the PL 900. But if we were to look at a sample Power app, for instance, we would be able to see information that’s on the web in a database in this particular instance. And here I can edit it and you can see I’ve done conditional formatting with the background and with the font size and that sort of thing. So this is an example of a very simplistic app. However, what is driving this information? This information is being driven by database in my environment.
And the database consists of a series of tables which used to be called entities, which naturally have got columns, but these used to be called fields. But basically it’s tables, columns, rows nowadays and you can see the sort of things that we’ve got. So what I’m going to do is connect to this particular database. I don’t expect you to be able to do this unless you’ve actually got a Power Apps account, but it might just be of interest of how to actually connect. So I go to the start up here and I click on Session Details and this gives me a URL to which to connect to.
But it’s not the entirety of this. I don’t want anything either side of the slashes, so I just want everything except Https and then a slash at the end. So I’m going to copy that and I’m going to go back into Power Bi. So I’ll select Data Verse and you should always select Data Verse unless you’ve got a reason for going for the Common Data Service. One possible reason is you’re importing a huge amount of data. At the moment, Data Verse isn’t quite set up for that. So I’m going to connect and it’s asking me for my environment domain. This is it. And then click OK.
Now what will probably happen next is you’ll get a dialog box like this. We encountered an error while trying to connect a TDs protocol. Endpoint is disabled for this organization and is disabled by default. So let’s enable it. So I go to my PowerApps, I go to the Admin Center, I go to my particular environment. My environment is PO 900, I go to Settings, go to Product, go to Features and then enable TBS Endpoint. So I’ll just click that on and save. So now when I connect, here is my account and if I go down to my contacts, which is what we were looking at earlier, we will be able to see it and be able to load and transform data and all the rest of it.
Of course, you’ll also need to make sure that you’ve got appropriate permissions to be able to access the data. So it is actually quite a big job to get it set up initially. And this particular certification DN 100, you don’t need to know any of that. All you need to know is that the database is a database which is used in the rest of the power platform. So it can be used in power apps, it can be used in Power automate and you can use it in power bi by going to getdata more dot or going down to the Power platform. It is in the old as well. But if I go down to the power platform, there it is, the data verse.