Lean Six Sigma Green Belt – Six Sigma Analyze Phase Part 9
- Regression Analysis_Part 2
Here is a case study. Dietitian wants to find out whether there is a relationship between calories consumed versus weight gained. If there exists a relationship, then how can we accurately establish this? That is a question that we have. And this is the pain area which we need to address using regression analysis. Weight versus calories. Let us open that particular file in minitab and go about doing this. The first step in regression analysis is to look into the relationship between the variables y and x. Weight gain is y. Is it dependent on calories consumed which is your x? This is a simple linear regression equation. Weight gain is a continuous variable.
How much over I divide this? Whatever accuracy I represent this, it is still going to make sense for me, doesn’t it? My wife’s weight is 150 kg. She is going to sue me for this, right? So I can either represent my wife’s weight as 150, can say 150. 1, can go to another accuracy level. 150. 1. 6, . 2 kg. It’s still going to make sense, doesn’t it? Calories consumed. I can break down the number to whatever accuracy level I wish to and it is still going to make sense for me and that is calories consumed.
So if the two variables are continuous, that is when regression comes into picture. The first step is to look into scatter plot and visually try to establish whether there is a linear relationship or not. All right? Because your Pearson correlation coefficient doesn’t give you the image. And remember, there can be a higher Pearson correlation value. Even if there is not a straight line linear relationship. Even if the relationship is curvy linear a curve, still it might end up showing you a strong relationship. Hence, we first look into the scatter plot. Let’s do that. Go to graph.
Scatter plot in minitab. Let me do that. Here we go. This is the example, right? I go to graph. I click on scatterplot. I do a simple scatter plot. I click on okay. Here I need to enter weight gained in y and calories consumed in x. I have these values already entered here because I’ve done this exercise in the past. So otherwise this is how the screen would appear. You will have to click on this y variable. Select weight gain. Select it automatically. It will shift. The pointer is going to shift to x variables. Now select calories consumed. Click on select all right. Click on OK to view the scatter plot. Here is the scatter plot. It shows more or less a linear relationship. The points are a little closer to each other.
So it can be a strong positive correlation between weight gain and calories consume. But we want to evaluate this using Pearson correlation coefficient, the small r value. How do we do that? Go to stat, basic statistics, correlation. How do I go back to the worksheet? I’ll simply click on this right Show worksheets folder. I click on that I go back to this. I need to go to stat basic statistics and I need to click on correlation. The moment I hover my mouse onto that, it’s going to show me this, right? A small popup which will explain me quickly about what correlation is all about. Click on that. Since I have done this already, these values are selected. I remove that. This is how it’s going to look like when you open for the first time. I select these two.
I click on select. It’s going to pull in these two values into variables field. There are a lot of methods. You can either use PSN correlation or spam and row. I’m selecting PSN correlation coefficient. This class is limited to few things, right? We are not into core statistics class. This is a six sigma session in the core statistics class which XLR provides. You will learn about PM and row also in Master Black. But to some extent you learn that. All right, I’m just clicking on OK, here goes. The value PSN correlation coefficient is zero point 94 seven.
Remember, anything greater than 0. 8 or zero point 85 is a strong relationship. So we have a strong relationship here. So that is what it sees. Only if I have a strong relationship, I can proceed with regression analysis. How do I do that? Go to stat regression, regression fit regression model. I’ll do that by going back to the worksheet or minitap. Sorry, how do I go back to the worksheet? I just click on this here, right? I need to go to stat regression.
I’ll go to regression once again in that I’m going to fit a regression model. So these values would be available here if you already have done this exercise. So since we have not done this yet, click on responses say weight gain, continuous predictors, calories consume. Here we have categorical predictors. Also, if you have attribute data, you can do a logistic regression as well. All right, we’ll discuss about logistic regression in black belt, not in green belt. All right. And now we just click on OK, let us see how the output is going to look like. Hey, here I go.
I have an equation now. Weight gain is equal to minus 626 plus zero point 420. Two calories consumed. Wow, I’ve got a preaching equation. Now. If a one is going to give me calories consumed, dietitian is simply going to substitute the calories consumed in this equation to get the weight gained. How cool is that? But let us look into the entire analysis. We’re not looking to each and every concept of this or each and every component of this. We have degrees of freedom, adjusted sum of squares, adjusted mean square f value and p value there, right? So regression p value is 0. 0, right? And calories consume has p value of 0. 00.
So your coefficient and your x variable are both having a p value which is less than 0. 5. Hence, these two variables are significant. They are significant to predict your weight gain and look at the R square value, which is 89. 68%, right? It’s greater than 0. 8 or 80%. That says that your model is good, model higher the squared value better. Your model prediction would be 89% of the variation of y is explained using this equation. That is what it says, right? All right, so let’s go back to this. Here we go. So we have got the equation.
I can substitute calories consumed. From now on, any person says hey, I’ve consumed so many calories. I’ll say, hey, do you know what? If you continue this lifestyle, then your weight gain would be so much I’ll be able to predict. Now I can claim myself as a dietitian, right? All right, let’s move on. Check for sufficiency of critical inputs in this particular equation. It so happened assume that calories consumed was insignificant variable. That means Pvalue was greater than 0. 5. That means this x variable will not be used or you cannot use it for predicting your weight gained. Then I need to go back and identify a few more inputs.
- Regression Analysis_Part 3
So that is all about the step check for sufficiency of critical inputs, right? If critical inputs are found to have sufficient capability to realize the targeted improvement in why it is time to move to the improved phase. If the critical inputs do not have sufficient capability, you need to identify more critical inputs, more critical inputs need to be found, or your project charter needs to be reviewed. And do you know what? You’ll have to redo the entire analyze phase once again, right? So with this context, let us move on. Outputs of analyze phase we came to an end.
Now, what did we do? What is the main output of analyzed phase? List of validated vital few inputs, x’s or predictors? Not just that we have few other inputs. We have identified all the potential inputs. That was the first output. We validated the root causes, we have performed the gap analysis and here comes the last slide, but the most important slide, right? What did we do in the analyze phase? Let’s summarize it. We have identified the potential inputs using qualitative tools, time based analysis, value stream mapping, basically. And we also use a risk based analysis FMEA. Then we have done this. We have taken the potential inputs which were validated using quantitative tools. Hypothesis testing is a quantitative tool, right?
We compared our ages of two samples of populations using two sample Ttest. We compared averages of more than two samples of populations using analysis of radiance. We have compared the proportions of two samples of populations using two proportion tests. We compared the proportions of more than two samples of populations using chisquare tests of homogeneity post, which we have looked into how to deal with data if both the variables x and y are continuous. The first step was we have come up with a scattered plot which helped us visually determine the correlation between two continuous variables, right? We could make a statement on whether x and y are positively correlated or negatively correlated.
The direction basically, whether it’s linear or curvy linear, we could identify that and then we have looked into correlation. Correlation helps us determine the degree of linear association between two continuous variables. Then we have moved on to look into this. Regression helps us determine the equation to predict or focus the output. And then we have checked whether statistically validated inputs, that is the critical inputs, have sufficient capability to realize improvement in why the output right there we come to end of analyze phase. Thank you so much for attending this session. I’m looking forward to you guys to listen is into the improved fees and then the control fees. Thank you so much. Bye.