DP-203 Data Engineering on Microsoft Azure – Design and Develop Data Processing – Scala, Notebooks and Spark part 1
- Section Introduction
Hi and welcome to this section in which we are going to have a primer on Python, on Scala. Just one video on Notebooks, Jupiter notebooks, and then we look at the spark pool that is available in Azure Synapse. Now, by no means whatsoever, is this supposed to be an extensive course on using Scholar or using Python? The reason that I’m having this in place, as students are always aware, based on my courses, I want to set a base for students to understand what I’m going to be discussing in future chapters. And I want students to understand what I’ll be doing when working with spark pools in a Zeosin apps. Because in there we are going to build notebooks that is going to have code written in Scala and written in Python for that. I just don’t have a quick primer.
So it’s not extensive, it’s just some few constructs. When it comes to Python, when it comes to Scala, it just kind of gives you that sort of base, that foundation, knowing that, okay, we know that Python is there, we know Scala is there, how can we use these languages in notebooks, in the spark pool? And then obviously from there, right, you can go ahead, do more research, understand more about either Python or Scala and see how to work with notebooks. So I said initially, let’s just have a quick primer on Python, Scala, little bit of notebooks, and then we’ll move on to the spark pool.
- Introduction to Scala
In this chapter, just want to give a quick introduction on to Scala. In the next chapter, we will see how to install Scala on our local system. So this is a strongly type general purpose programming language. It supports both objectoriented language and functional programming. Now, the source code for Scala can be compiled into Java bytecode as well. And when we look at the installation for Scala there you will see that we need to have the Java runtime also in place on our local system. And then you can run this on a Java virtual machine. Now, Scala has been around for quite some time. It was first released in the year 2004. So in the next chapter, we’ll be installing Scala along with the Repel.
So Repel is a command line utility that can be used for running Scala commands. Rebel actually stands for Read, Evaluate, Print, and Loop. So the Repel can first read the commands that you issue against it. It’ll then evaluate the commands. It will then print the output, and then it will go back into a loop for reading more commands. This repeat is available with a certain number of programming languages. Right? So in this chapter, just wanted to first go through an introduction of Scala. In the next chapter, we look at how to install Scala on our local system. You.
- Installing Scala
So in this chapter we are first going to see how to install Scala on a local system. So on my local system I first need to have the Java platform in place. So Scalar depends upon Java and we have to install Java first before we can install Scala. And I’m on a Windows based system just for your information. So here are am going on to the downloads page onto Oracle. com. When it comes to Java, I am choosing the version that is currently available. I’ll click on JDK download. This is the Java development kit. So I’ll scroll down and I’ll look at the Windows X 64 installer. I’ll download the exe. I’ll review and accept the license agreement and download the exe. I’ll hit on keep. Once the download is complete I’ll click on the exe.
So this should then start the installation. For the Java Development Kit. I’ll go onto next. Here I’ll leave the location as it is. I’ll go onto next. It will now start the installation process. Once this is complete, I’ll hit on close if you now open up command prompt. And here now if you type in Java and get the version so you can see the version of Java in place. Next I’m going on to the download page for Scala. So I’ll download Scalar version three. I’ll click on it. Now here we have some different options when it comes to installing Scalar. So the first thing I’ll do is to download the cosier. So here it’s giving us what are the different installation steps. So I’m on a windows based machine. So I need to copy this command. Just copy it.
I’ll go on to command prompt. I’ll place it here. I’ll just right click and now it should start the download process for whatever is required for installing Scalar. Once this is complete, I’ll go back onto the previous page. Now I want to install the Scalar compiler and the Scalar interactive readeval print loop. So I’ll copy this command. I’ll place it here. This is done. We can see it’s giving us a warning that this is not in our path. So let me copy this. I’ll go on to command prompt, go on to Edit, hit Copy. I’ll go on to Control panel on my local system here. I’ll go on to System and security. I’ll go on to system. I’ll scroll down and go on to Advanced System settings here.
I’ll go on to my environment variables here. I’ll go on to path the system variable. And here let me click on New. Let me add that and let me hit on OK. Hit on OK and hit on OK over here. Now next I want to install the Ripple. So I’ll copy this. Add it over here. Switching new update. So we have the reple also in place. It’s giving us the same warning because we need to close command prompt and open it up again. So that it now takes a new path into consideration. So I’ll just close command prompt. Now I’m just going to open command prompt and run it as an administrator here, let me go on to the Temp folder and I can now start the repo. And now we can start issuing Scala commands. So in this chapter, I first wanted to show you how you can get Scala installed on your local system.
When you go on to working with Spark pools in Azure Synapse, which you will see later on in this section, you will see that when you want to issue Scalar commands against Spark, a Spark system, you don’t need to install anything there. You can use the notebook experience to start issuing Scalar commands. And not only that, in Notebooks you can also issue Python commands, SQL commands, et cetera. So all of this which I’m showing you when it comes to the setup, so I’m going to be looking at Nova on Scala, Python and Spark. And in all of them we are going to look at the initial setup. And one of the benefits of having a managed service in place in Azure is you don’t have to go through all of this setup process to start issuing commands. Everything will be installed for you. Right? So this marks the end of this chapter.
- Scala – Playing with values
Now, in this chapter, just want to go through some few commands when it comes to the Scalar reper. Then in a subsequent chapter, I’ll show you how we can use an integrated development environment that’s IntelliJ to actually also work with scala. So here in the Scala command line, if I just type let’s say hello. Now here it has returned something to you. So here we have something known as Val Rest Zero. Now, this is the type. So it is a strongly type programming language. So you have given a string on to the repper, it has returned it back to you. And here it has given what is the data type of what you have given.
If I, let’s say, input a number here you can see that it has returned the data type of int. That means an integer. Let’s say I enter a boolean value. Here you can see it has returned the type of the Boolean value and you can actually start even performing operations here. So you can see that it has given the value of five if I look at, let’s say, concatenating strings. So here we have the concatenate string. And now you can see that with every input that we give, it is actually defining a new variable here. So you can actually replay the values. So if I just type in Rest Zero here you can see it has returned the value string is equal to hello. And here, now we’re getting another variable if I go on to Rest one.
So we can see the integer here, we can also define our own variables here, so I can assign our value of five onto a variable name of x. Next I can define another variable and then I can add both of them and we get the output has desired. So if you want to look at simple commands, if you want to get started with Scala to see in very important what are the data types and how it simply works. You can actually start with this ripple, this command line utility. In the next chapter, as I mentioned, we’ll go through a tool that you can use for creating small snippets of scholar programs.
- Scala – Installing IntelliJ IDE
Now in this chapter I want to show you how you can download the free version of the Intelligence. So this is normally used for building your Java based programs, but you can also construct scala programs as well. So on the homepage for Intellig I’ll actually download click on the download button because I want to install what is the free Community Editor Edition. So if you scroll down, you can see that with the Community Edition you can develop Scala based programs as well. So I’ll download the exe. I’ll hit on Keep, let it download the Exe. Now once we have the exe in place, I’ll click on it so it will start the installation process. For the intel j I’ll go on to next. I’ll go ahead and hit Uninstall. Now here it’s saying that my computer needs to be restarted in order to complete the installation.
So what I’ll do is that I’ll just click on I want to manually reboot later. And what I’ll do is that I’ll reboot my system and then let’s come back. Now I’ve restarted my system. I’ll start in Tej. From my Start menu I’ll confirm that I’ve read the license agreement. Hit on continue. Now here I’ll go on to the Plugins. First here I’ll install scala and then I’ll restart the ID. So I’ll hit on restart. Then I’ll click on a new project. Here. I’ll choose Scala. I’ll choose SBT. And then I’ll go on to next. In the next screen I’ll just choose a location and I’ll give a name. I’ll hit on finish. So Windows Defender Firewall is asking me to allow access onto IntelliJ. So I’ll click on allow access.
I’ll just close the tip. So here it’s actually going to do some work in the background before we can actually start using IntelliJ. So you can see it’s downloading some pre built shared indexes. Let’s wait till this is complete. Once the download process is complete, I’ll just close this. Now I’ll also just update the scala SDK. So here I’ll go. Onto files. I’ll go on to settings. Then here in the plugins section we have to go on to Scala onto the installed and I’ll just click on Update All. Once this is complete, I’ll hit on OK. I’ll just restart the IntelliJ IDE. Then I’ll right click on my folder. Here. I’ll add framework support. I’ll choose scala. Now here there is no library that is selected.
I’ll hit on create. I’ll hit on download. I’ll hit on OK and now I can see that we have the global level library. I’ll hit on OK. Now here when I go onto my source folder and I hit on you, I can now create something known as a Scala Worksheet. I’ll just give a name and hit on OK. And now we can add some commands over here. So I’ll just print Hello World and I’ll run this and we can see the result over here. So now it’s a long process to actually get this up and running in the IntelliJ. ID I said that when we go on to working with notebooks, when it comes to the spark pool or when it comes on to Azure data bricks we can see everything will be configured for us. And we don’t have to go through this exercise of actually having scala set up.
- Scala – If construct
Now we’ll just look at some short videos when it comes on to some of the commands that are actually available. So firstly I’ll actually delete my hello world SC. From here I can hit on OK and I can go on to my scalar folder here and here I can create a new Scalar worksheet. I can just get have a name and hit on OK, so here in vs code let me look at my first set of statements. So here I’m creating a variable with the name of I and I’m giving a value of nine. And here I’m using an if condition. So if the value of I is less than ten then please print this statement. Else please go ahead and print the other statement. So I’ll take these commands as they are I’ll go on to my worksheet and let me run this. Here we can evaluate the worksheet.
So here we can see our variable i. We can see the data type has an integer and here we are seeing the desired result. Now, when defining a variable here so we can put this as either VAR or val. So even val I is equal to nine is also possible. If I evaluate this worksheet, I get the same result.Now, what is the difference between val and val? So here if I try to change the value of i, you can see we are getting an error. So reassignment onto Valve this is not possible. So val defines that this particular variable is immutable means you can’t change the value of i, whereas if I change it back onto bar here we can see that we are not getting an error because we can change the value.
If I evaluate my worksheet. Here you can see the desired results. Here you can see that it’s also giving the statement that the value of I was mutated means it had changed here. Also note that we have not given the equals clause. So here it’s saying the number is more than ten even though the value of I is equal to ten. So for this you can actually add the equal to here and here you can say the number is less than equal to ten. And let’s run this again. So here you can see the output has desired. So in this chapter want to show you the ifconstruct wherein you can run a set of statements based on a particular condition.
- Scala – for construct
Now, the next construct I wanted to go through was the four construct. Now, here for values from one to ten, I want to perform some sort of action. And here I am saying, please just print the value of i. Yeah, I have a string that says the value of I is and then I’m using the plus operator to concatenate the value of I with this particular string. So for each value of I, it is now going to execute this statement. So if I take this and let me replace it here and let me execute this. If you just hit the plus symbol here, you can see all of the values of I here. If I just remove the VAR statement and let me run this. So you can see that you don’t even need the VAR statement to define I, because is going to be in the scope of this for statement itself. So in this chapter, I want to show you how you can use the for construct to run a statement a certain number of times.
- Scala – while construct
Now, next, I wanted to go through the while construct. So here I am again defining the value of i. Here I am defining the value of I with the VAR construct because I want to ensure that I can change the value of I in this while construct. So here I am trying to say that while the value of I is less than or equal to ten, then print the value of i. At the same time, I am also incrementing the value of i. If we don’t increment the value of i, then this will go in an infinite loop. So let me copy this, and I’ll place it here, and I’ll execute this. And here you can see all of the values of i. So this is another way in which you can execute statements, a set of statements a certain number of times. But this is based on a different way in which you evaluate the condition.
- Scala – case construct
Now, in this chapter, I briefly want to go through the case statement. So if there are multiple conditions that need to be matched, instead of actually using the if construct, you can actually use the case and the match construct. So here I am assigning a value of 100 onto i. And then here I am trying to match the value of i.If the value of I is less than 50, then I want to print this statement. If the value of I is greater than 50, I want to print this statement. So let me copy this and replace it. Here I’ll hit on execute so I can see the output has desired.
- Scala – Functions
Now, in this chapter, I just want to go through functions in brief. So normally when you have code that needs to run over and over again, you will actually embed that code in a function and then you can invoke that function any number of times. So for example, here I am creating a simple function function with the name of add. This function takes in two parameters x and y. The data type of x is integer and the data type of y is an integer.
And I’m just returning the sum of x and y here. The return type of my value is also an integer. Then in a print statement I can invoke that add function and then I can pass in values. So 10, 90 will become the value of x and 90 will become the value of y. So normally when you have code that needs to be executed over and over again, you will embed it in a function so that you can call the function at any point in time. So I’ll just copy this and let me run this. And here you can see the output has desired.