All Microsoft Data Science DP-100 certification exam dumps, study guide, training courses are Prepared by industry experts. PrepAway's ETE files povide the DP-100 Designing and Implementing a Data Science Solution on Azure practice test questions and answers & exam dumps, study guide and training courses help you study and pass hassle-free!
Classification
14. SVM - What is Support Vector Machine?
Hello and welcome. Today, we'll look at the support vector machine algorithm. So, what exactly is SVM (support vector machine)? SVM is a supervised learning algorithm, and it can be used for both regression and classification. However, it is mostly used for classification. Support vectors are nothing but the coordinates of an observation, and the observations are plotted in an n-dimensional space. And, given the number of features in SVM, the observations are separated in space by a hyperplane for classification purposes. Okay, let's look at that with an example and go over it in detail. First, let's define vector and how support vector machines work.
Well, we have these points plotted on two features, x1 and x two. And then the arrow that travels from the base to the point is nothing but a vector. A vector has a direction and magnitude, which is nothing but the length of the arrow. Now let's try to visualise it in three-dimensional space, that is, with three features: x 1, x 2, and x 3. All right, there is a point P with these coordinates, and this arrow is nothing but the vector. So if we have these observations, a hyperplane passing through these points can separate those points, and that is what the support vector machine does. We have a set of observations, as shown here. Then a hyperplane can separate them into two classes. But how do we know which hyperplane to choose? And as we can draw an almost infinite number of hyperplanes, that is where the support vector machines are going to help us. Here we have three hyperplanes correctly separating the two classes correctly. So which one do we choose? We do that by maximising the distances between the nearest data points of both classes and a hyperplane.
This distance is called the margin. Now, as you can see, hyperplane Cis is at the maximum distance from the nearest data points from the two classes. Hence we choose C as the hyperplane, and these two points are our support vectors. Another problem, however, is that a hyperplan may be able to satisfy this condition and may have a different direction. Then what do we do? Well, we solve it by identifying the accurate classes first before the margin calculation. So our first priority is to predict accurate classes. I hope that explains the concept of support. Vector machine. In the next lecture, we will go through the implementation of SVM using Azure Machine Learning. Thank you so much for joining me in this class, and I will see you in the next one. Until then, have a great time.
15. SVM - Adult Census Income Prediction
Hello and welcome. In the last lecture, we understood what a support vector machine is and how a hyperplane creates the classification. Support vector machines are one of the first machine learning algorithms. Despite the fact that Cent research has developed more accurate algorithms, This algorithm can work well on simple datasets when your goal is speed over accuracy, as Microsoft does not use the kernel function, which can significantly improve the accuracy. The SVM in Azure ML does not give you very good accuracy compared to the other algorithms in today's class. Using adult sensor data, we will create our first SupportVector Machine model.
Because we have already done it, I'm going to save time by simply opening it, which is right here. Let me just open it. Alright. Now there are two things I can do. Either I can save this with another name or I can simply add SVM to the same experiment because we may need this model for future comparisons as well. I'm simply going to save this with a different name. Okay, I'll give it a new name, the adult Census SVM, and click okay.
All right. We have already completed the data processing steps of cleaning the missing data, selecting the columns for the experiment, and modifying the metadata so that we converted the columns to categorical. We have also split the data into a 70/30 ratio, and we are ready to apply the SVM. Let's adapt this to support vector machines. So let's copy and paste the train and score module. Arrange them nicely over here. Let's now add our model, which is a two-class support vector machine. So I'm going to search for support vector machines, and let's choose the two classes of SVM. Let's make the connections.
The SVM goes to the training model and the training data as well as the test data. All right. And let's give the score model to the evaluation model. And let's go through some of the parameters that this model requires. The first one is trainer mode. We have seen this, and we will use a single parameter with a specific set of values and a number of iterations for the model to run before it converges on the optimal solution, which is the second parameter lambda.
Here is the value to use for lone regularization, which we saw in the logistic regression as well. And as we know, the larger value penalises the model, while a too-small value can lead to underfeeding. We will keep the default for now. And normalising features is another parameter if you want to normalise the features before training. And if you apply normalisation before training, Data points are centred at the means and scaled to have one unit of standard deviation.
The next parameter is project to the unit sphere, and projecting values to the unit space means that before training, the data points are centred at the mean. The random number sheet is any integer value; we know that I keep it to one, two, and three. Now, the "unknown" category basically creates a group of unknown values in the training or validation sets. So with these values for the parameter, I am going to run it. All right, it has run successfully.
And let's visualise the output. As you can see, the accuracy as well as the AUC using SVM are slightly less than the decision forest. This model is typically used when speed is more important than accuracy. All right, that concludes the lecture on two-cloud support vector machines using Azure ML. Thank you so much for joining me in this class, and I will see you in the next one. Until then, have a great time.
Hyperparameter Tuning
1. Tune Hyperparameter for Best Parameter Selection
Hello and welcome. In the last lecture, we understood what a support vector machine is and how a hyperplane creates the classification.
Support vector machines are one of the first machine learning algorithms. Despite the fact that Cent research has developed more accurate algorithms, This algorithm can work well on simple datasets when your goal is speed over accuracy, as Microsoft does not use the kernel function, which can significantly improve the accuracy. The SVM in Azure ML does not give you very good accuracy compared to the other algorithms in today's class. Using adult sensor data, we will create our first SupportVector Machine model. Because we have already done it, I'm going to save time by simply opening it, which is right here. Let me just open it. Alright. Now there are two things I can do. Either I can save this with another name or I can simply add SVM to the same experiment because we may need this model for future comparisons as well.
I'm simply going to save this with a different name. Okay, I'll give it a new name, the adult Census SVM, and click okay. All right. We have already completed the data processing steps of cleaning the missing data, selecting the columns for the experiment, and modifying the metadata so that we converted the columns to categorical. We have also split the data into a 70/30 ratio, and we are ready to apply the SVM. Let's adapt this to support vector machines. So let's copy and paste the train and score module.
Arrange them nicely over here. Let's now add our model, which is a two-class support vector machine. So I'm going to search for support vector machines, and let's choose the two classes of SVM. Let's make the connections. The SVM goes to the training model and the training data as well as the test data. All right. And let's give the score model to the evaluation model. And let's go through some of the parameters that this model requires. The first one is trainer mode. We have seen this, and we will use a single parameter with a specific set of values and a number of iterations for the model to run before it converges on the optimal solution, which is the second parameter lambda. Here is the value to use for lone regularization, which we saw in the logistic regression as well. And as we know, the larger value penalises the model, while a too-small value can lead to underfeeding. We will keep the default for now.
And normalising features is another parameter if you want to normalise the features before training. And if you apply normalisation before training, Data points are centred at the means and scaled to have one unit of standard deviation. The next parameter is project to the unit sphere, and projecting values to the unit space means that before training, the data points are centred at the mean.
The random number sheet is any integer value; we know that I keep it to one, two, and three. Now, the "unknown" category basically creates a group of unknown values in the training or validation sets. So with these values for the parameter, I am going to run it. All right, it has run successfully. And let's visualise the output. As you can see, the accuracy as well as the AUC using SVM are slightly less than the decision forest. This model is typically used when speed is more important than accuracy. All right, that concludes the lecture on two-cloud support vector machines using Azure ML. Thank you so much for joining me in this class, and I will see you in the next one. Until then, have a great time.
Microsoft Data Science DP-100 practice test questions and answers, training course, study guide are uploaded in ETE Files format by real users. Study and Pass DP-100 Designing and Implementing a Data Science Solution on Azure certification exam dumps & practice test questions and answers are to help students.
Million thx