Lean Six Sigma Green Belt – Six Sigma Analyze Phase Part 7
- Hypothesis Testing_ANOVA_Part 1
First, there is a marketing organization which outsources their back office operations to three different suppliers. Now, the count is three, right? The contracts are up for renewal, and the chief marketing officer wants to determine whether they should renew contracts with all suppliers or any a specific supplier. Chief marketing officer wants to renew the contract of supplier who has the least transaction time. Chief marketing officer will renew all contracts if the performance of all suppliers is similar.
Now go ahead and do the analysis and tell your chief marketing officer on which of the three suppliers is doing a good job. First things first, we need to identify what is Y and what is X. What is a Y here? What do you think is a Y here? Trying to understand. It’s the transaction time. It’s a transaction time. Simple. What is x? We have three different suppliers, so vendors or suppliers. ABC.
Is your x. Is transaction time continuous? Yes, Y is continuous and vendors x suppliers ABC, is that discrete? Yes, it is discrete. So what happens if Y is continuous and x is discrete in more than three categories? Remember, y is continuous. X is discrete. You’re comparing more than two population samples with each other, right? We are comparing three different vendors, three suppliers.
Hence we go to the right side of the flowchart. Now find out whether all the outputs are normal or not. That’s the first. If they’re normal, you check whether the variances are equal or not. And if the variants are equal, you perform one way or Noah test. If the variances are not equal, you perform a different test. Let us continue and check what happens to the data. All right, first things first. Let us try to compose or come up with null and alternate hypothesis. Your null hypothesis says data are normal. Your alternate hypothesis says data are not normal.
When you perform normality tests, you will have to look into Anderson Darling p value. How do you perform the normality test on a minitab? You just need to go to stat, basic statistics, graphical summary, and you get to see these three reports which are put here, right, with the 95% confidence. That means my alpha value is 5% or 0. 5. And we know that if the p value is greater than 0. 5, we assume that the data are normal. And in all these three cases, Pvalue is greater than 0. 5. So data are normal.
If data are normal, the second step, which you have to perform is test for equal variances. First, compose what is your null hypothesis and what is your alternate hypothesis? Your null hypothesis says all the variances are equal. Your alternate hypothesis says not all variances are equal. So you write down in this way variance of my supplier A is equal to variance of my supplier B is equal to variance of my supplier C. That’s my null hypothesis.
And if I say variance of my null hypothesis or sorry, supplier A is not equal to variance of supplier B is not equal to variance of supplier C. That is my alternate hypothesis. Right. How do you perform this test? Go to stat ANOVA test for equal variances. And when you test that, you see that for multiple comparisons test the Pvalue is zero point 67 four. If p value is greater than 0. 5, what do you say? You say p hi, so null fly. So you lax at the null hypothesis which says all the variances are equal. If all the variances are equal, what do you do next? You perform an over one way test. Once again, the first thing would be composing your null and alternate hypothesis as part of null hypothesis. You’re saying that all means are equal. That means you’re saying average of supplier A transaction time versus average of supplier transaction time B is equal to supplier transaction t transaction time for supplier C, average transaction time of supplier C.
Right. What does your alternate hypothesis say? It says average transaction time of supplier A is not equal to average transaction time of supplier B, which is not equal to average transaction time of supplier C. You can either represent using this notation or you can write it down. Why am I using mu symbol here? Because you do hypothesis testing to make inferences about your population.
Though you have a sample at hand, I’m drawing inferences about my population, hence the statements would contain the population notations. However you’re analyzing the sample. All right, this is the navigation path of minitab. Once you do that, you see that the Pvalue is 0. 4 p value. Is it greater than 0. 5 here? Yes, 0. 4 is greater than 0. 5 p hi null fly. All right, so you select this, you say that the chief marketing officer, the transaction time of all the three suppliers is the same. So you’ll have to renew the contract time for all the suppliers.
Yeah, that is the conclusion that you draw here. Okay? Now, what we have to do is I need to show this case study using minitap for you. Let me open that. This is one way I know what the contract renewal case study basically, so it’s opening on minitab 17. So bear with me while this opens up. Just a minute, while the magic box opens up. Magic box. Here is the case study which has opened up in the mini tab. So there are three steps which we need to perform here, right? The first step is we need to check whether the data follows normal distribution or not. For that, you just need to go to stat basic statistics, do a graphical summary of all the three suppliers.
I select all three and click on select click on OK, I got three charts. This is for supplier C, p value is greater than 0. 5, hence the data are normal. This is for supplier B, p value is greater than 0. 5. So data are normal. Even using this graph, you can conclude that the data are normal, right? But you do not always want to bank on the images diagrams. You want to statistically prove it using numbers, right? Let us look into supplier a p value is much higher than 0. 5. It is 0. 941 p hi null fly in the null hypothesis says data are normal. If data are normal, your second step would be to test whether the variances are equal or not. For that you know to go to stat a nova test for equal variances. I have done this exercise before, hence those values are appearing. Otherwise, this is how it would be. Response data are in separate column. For each factor you click on this. In this window, select Supply or ABC. Click on select now magic box does the rest for you. Click on OK. It says that the multiple comparisons p value is zero point 67 four, which is greater than 0. 5 p high null fly. You go with null hypothesis. Null hypothesis says all the variances are equal. You go with Levine’s test if you do not assume that the data are normal. But for our case, we have assumed that the data are normal.
Hence I’m looking into multiple comparisons, right? Let us go back to the worksheet. If the variances are equal, you need to perform ANOVA one way. That is what I’m doing here. Response data are in separate column. For each factor I click in this box called responses. I select all the three suppliers and I click on OK. Now let me minimize this and go to the actual sheet. Here I get to see the p value, which is 0. 4 p hi null fly, right? So accept the null hypothesis basically, and the null hypothesis says the transaction time of all the three suppliers is DC. Hence we conclude in this way. All right. I say the transaction time average transaction time mean basically let me go back to the worksheet. Here are the frequently asked questions, right? Why do we check whether the data follows the normal distribution or not? An over one way assumes that the data follows normal distribution, and therefore we carry out the normality test. To check whether the data actually follows the normality test or not, we look into Anderson dialing normality test.
If the p value is greater than 0. 5 or more than alpha, we assume that the data are normal and we proceed. What is Levine and multiple comparison tests. For Levine and multiple comparison tests can be used to test variances of several groups. If you have more than two groups, basically, multiple comparison tests for variances assumes that the data follows normal distribution. However, Levine’s test does not make any such assumptions. And then we test for equal variances, which is carried out to satisfy the assumption of homoschidasticity equal variance for an over one way. What is homoschidasticity? If all the random variables in the sequence have the same variance. It is called as homoschidasticity, right? Everything, all the data points in that particular sequence, all the variables have the same variance. That is called as homoschidasticity.
You need to perform this because the assumption of homoschedasticity simplifies many mathematical and computational treatments and serious violations in homosecadasticity may result in overestimating your model. Hence, you have to assume and look into this test basically even before you proceed. All right, what is ANOVA for? ANOVA stands for analysis of Variances. We are doing one way which tests to determine whether means of several groups are equal or not. Here are a few underlying assumptions of ANOVA one way. Within each sample the values are independent. The K samples are normally distributed. The samples are independent of each other. The K samples are all assumed to come from populations with the same variance. These are few of the underlying assumptions. We have looked into two sample tests and ANOVA. But what is the real difference between these two?
Look at this. Is a transaction time dependent on whether person A or B processes the transaction. You have two people, person A and person B. Hence, it is two sample t test. Is medicine One effective or medicine too effective at reducing the heart stroke? There are two things which you are comparing medicine One and medicine two. Hence, two sample t test. Is a new branding program more effective in increasing the profits? You’re comparing two things the old branding program and the new branding program. Hence, it is two sample theaters.
Two things you’re comparing. If you’re comparing more than two things, then you go with Anoa. Look at these examples. Does the productivity of employees vary depending on the three levels? Beginner, intermediate and advanced? Three different sale closing methods were used. Which one is more effective? You are again speaking about three. Which is greater than two? Hence you go with ANOVA Four types of machines are used. Is weight of the rugby ball depending on the type of machine used? Look at this. You’re looking into four types. Four. Anything greater than two, you go with Anoa test. That’s the difference if you have two things to compare or go with two sample t test. If there are greater than two, then you go with Anoa one way. Okay.
- Hypothesis Testing_2 Proportion Test
This is a hypothesis test to compare two population proportions to determine whether they are statistically significantly different or not. Here are a few examples. You have samples of camera lens from two different suppliers, and you have to determine if the proportion of camera lens that fail is different depending on the suppliers. You go with two proportion tests. In this case, you have data for samples of animals who received an experimental vaccine. And you wish to determine if the proportion of animals that get sick in this group is different from that of the unvaccinated group. Two proportion Test you’re comparing proportions, two proportions. For two proportion tests, you compose the null and alternate hypothesis in this way null hypothesis says proportion A is equal to proportion B.
Alternate hypothesis says proportion A is not equal to proportion B, right? For this you look into the Pvalue and if p value is less than alpha, we reject null hypothesis. If we reject, we obviously go with alternate hypothesis, right? So let us look into the case study. Now, even before we get in the case study, here is a quick glimpse of when you have to use two proportion test. If both y and x, if both your output and input are discrete in nature, and if you have two inputs, two categories of inputs, then you go with two proportion test. All right, let us read the case study and understand right from the basics. Here is the Johnny Talker’s case study. Johnny Talkers now don’t get boozy boozy, all right? Okay.
Johnny Takas Soft Rings Division has been planning to launch a new sales incentive program for their sales executives. The sales executives have earlier suggested that adults who are greater than 40 years won’t buy Johnny Talker soft drinks. All the children would do, right? But you as a sales manager, you do not want to take their word on the face of that. You want to analyze the data and determine whether there is evidence at 5% significant level to support the hypothesis which your sales executives are making. That’s the case study. First, try to identify what is why here what is why? Percentage people who buy are percentage people who do not purchase, right?
Because they’re saying adults would not purchase, only children would purchase. So that is the output, right? Percentage purchased or percentage who did not purchase? That is my y output. What is x? X has two categories based on age. You have adults and children, right? These are the two categories. Both are discreet in nature. Hence we do two products proportion test. When you do two proportion tests, you get the p value as zero and the p value is less than 0. 5. In this case, p low null go right? And what does a null hypothesis say? Two proportions are not equal. So it clearly says that the percentage of purchases made by adults is not equal to the proportion of purchases made by children. That is what you can clue. Let us do this on mini tab, by the way.
And so let me go to Minitab. We are looking into two proportions. Here it is Johnny Tacos. Let me open that. Magic box is coming up. Here is the magic box, right? You have details. And here we go. Adults who have purchased and we have children who did not purchase or they purchased. This is the data which we have collected. So we just need to do two proportion tests.
For that. I’ll go to Stat basic statistics. I go to two proportions. Click here, right? These are selected because I have done this exercise earlier, as I’ve told you guys. Otherwise, this is how it will look like. Each sample is in its own column. So you just click in that, select arrows, double click, click on the children. Or do a single click and click on select both does the same thing. Okay. Simply click on okay. Here you get the result. And this result is available here. That is how we conclude two proportion tests.