Lean Six Sigma Green Belt – Six Sigma Analyze Phase Part 6
- Hypothesis Testing_Part 3
And understand the procedure of hypothesis testing. The acceptance of hypothesis is determined using Pvalue as follows you first come up with your null and alternate hypothesis. You select which test statistic to be used. You determine the significance level which is alpha value, right? In most cases, it is 5% or 0. 5. You determine the criteria for rejection, right? Either you do that or you calculate significant probability from test statistic. Or you calculate the test statistic. Determine whether to accept hypothesis based on statistics.
Here, you determine the acceptance of hypothesis using p value. If p value is less than alpha, you reject null hypothesis and accept alternate hypothesis. If p is greater than alpha, then you accept null and reject alternate hypothesis. You either reject or accept the null hypothesis. Basically, and the statement about alternate hypothesis follows based on the statements that you’re making about null hypothesis. For example, if I’m saying reject null hypothesis, what does that mean? I am indirectly accepting alternate hypothesis.
If I say accept null hypothesis, what am I saying? I’m saying reject the alternate hypothesis, right? In order to remember this, there is a shortcut p high null fly. That means if the p value is greater, accept null hypothesis and if p low null go. That means if the p value is less than alpha, you’re going to reject the null hypothesis. Wow, that’s an easy way to remember, isn’t it? Let us understand few things about the p value itself. P value if we reject the null hypothesis, p value is the probability of being wrong, right? If I’m rejecting the null hypothesis, pvalue is the probability of being wrong. In other words, if we reject the null hypothesis, p value is a probability of making a type one error. Hence, below less than alpha, which is the value for type one error. You say null go, I reject the null hypothesis, right? Okay. It is a critical alpha value at which the null hypothesis is rejected. So, if p value is less than alpha, you reject the null hypothesis and p value applies for all hypothesis testing, all the tests, right? P is alive from here on, at least for the hypothesis testing part.
All right, hypothesis testing and the various graphical tools which are used, right? First things first. We need to identify what is the data type of my output y. What is the data type for my input x. And what is the test which I have to perform based on the data types. If y is continuous and x is discrete with more than two categories, you use ANOVA test. If y is continuous and x is discrete, in two categories, you go with t test. If both are continuous, you go with regression. If y is discrete and x is discrete, in two categories, you use two proportion test. If y is discrete and x is discrete, in multiple categories, you use chi square test. We will understand each and every test using some examples. But for now, let us focus on what to do. If Y is continuous and x is discrete, why is continuous and excess discrete? The following flowchart will explain this. Here is the flowchart. If Y is continuous and excess discrete, and if you have two categories in discrete, then follows. If you’re comparing two populations or samples with each other, this is what you do.
You check whether y one and Y two are normal or not. If they are normal, then you check whether the variances are equal or not. Depending on whether the variances are equal or not, you’ll perform either two sample t test assuming unequal variance, or you’ll perform two sample t test assuming equal variance. What if data are not normal? You perform Man Whitney test for comparison of medians, right? And understand the other part.
If Y is continuous and x is discrete, and if you’re comparing more than two populations greater than two, then is Y one, Y two, Y three all those normal? If the answer is yes, check whether the variances are equal or not. If the variances are equal, you do one way ANOVA test. If the variances are not equal, you end up performing two sample t test. Compare means in pairs using two sample t test. What if the data are not normal? You use crosscalvalus test for comparison using medians. All the tests are related to means here, right? But with respect to if data are not normal, you compare the medians.
- Hypothesis Testing_2 Sample t test_Part 1
Here is the case study. First let us understand that and then we’ll understand the steps involved. Marketing Strategy there is a financial analyst at a financial institute who wants to evaluate a recent credit card promotion. After the promotion, 450 card holders were randomly selling. Half of them have received an ad promoting a full waiver of interest rate on the purchases which they make over the next three months. So over the next three months, purchase as much as possible, there would be no interest rate. Okay, that’s one ad. Another ad was sent to people and this ad was a standard Christmas advertisement. Christmas is approaching and then people have started promoting based on this. Now the financial analyst wants to understand whether the ad promoting full interest rate waiver, did it actually increase the purchase as yes or no?
That’s the problem which we have to solve here. So, first things first, let us try to understand what is the Y here y and what is X here? Y is purchase per customer. How much did you purchase, how much did each person purchase? Only based on that you will be able to conclude whether your new ad is effective or your standard ad is effective. Right? And the X is a type of promotion. Type of promotion. And what are the various types of promotions which we have here? One is your ad which was promoting a full waiver of interest rate. Another ad is a standard Christmas advertisement. So there are two advertisements. Purchase per customer is your output. If Y is continuous and X is discrete. Let me go back. What do you do? And if you’re comparing two populations, two things standard new I’m comparing the purchase mean, right? I need to check whether the data are normal or not. That’s my first step, right? And the second step is I need to check whether the variances are equal or not. And based on that, I need to select an appropriate test.
All right, that’s the first step. Is data normal or not? And this is the minitab navigation which is provided. Then you perform a variance test, either an F test or new test called as Bona test. You can perform any of those tests and as part of step three, you look into two sample t test. But will it be assuming equal variances or will it be assuming unequal variances? That’s the decision that we make based on test two. All right, so what are we waiting for? Let us open this marketing strategy in minitab and then try to solve the problem. Here is the marketing strategy problem. I have amount which people have spent using your credit card in dollars and I have the type of promotion. Is this promotion related to interest rate waiver or is it the standard Christmas promotion? I’ve done that, staggered it to appear in the two columns. Now let us check whether the data are normal or not. What do I do I simply go to stat Basic statistics. Graphical summary. I select both of these interest rate waiver and the standard promotion and click on OK. I get these two values right? Report for standard promotion and report for interest rate waiver. I get two charts.
Those charts are placed here. All right. Even before we continue, you need to first come up with your null and alternate hypothesis for any test which you are performing. Null hypothesis says don’t take any action, data are normal, you need not do anything. Alternate hypothesis says data are not normal, right? That is what they see. Now you are checking for Anderson Darling p value. You have done everything and you’re looking into Anderson dollar p value. And we know that if p is high, that means if p is greater than 0. 5, why am I taking alpha value as 5% or 0. 5? Because that is universally acceptable until unless your management comes to you and says hey, no, I will not allow a 5% chance of making a mistake. Take only one person until someone says that you go with 5%. One minus alpha is a confidence level which is provided here. Hence alpha is 5%. Confidence interval is one minus alpha which is 95%. All right. So here is a p value in both of these case and a syndrome normality. This p value is greater than 0. 5 and it is clearly saying that if p is high, null fly p hi is null fly.
That means you’re going to accept null hypothesis. And what is null hypothesis? Saying you is saying that data are normal. And remember, this flowchart extremely important. You’re checking for the normality and if you conclude that the data are normal, you need to perform the next test which is are variances equal or not? Move on. So this is the navigation, right? Basic statistics to variances. Here is the null hypothesis and alternate hypothesis. Null hypothesis says both the variances are equal. Always fine, don’t take any action. Alternate hypothesis on the other side says variance A is not equal to variance of B. Now we have to perform this test using minitab.
Go to stat Basic Statistics and two variances. Let me go to the minitab and let us do that. All right, where am I here? Let me go back to the worksheet by clicking on this. Now we need to check whether the variants are equal or not. You need to go to stat Basic Statistics. Click on two variances. I clicked on that. Now it’s coming up. Each sample is in its own column, column C five and C six. That is what I’ve selected. I clicked on this. I selected c five. Oh my God. Let me remove that. Okay, let me select this. All right, let me remove this. Let me select the sample too.
I simply click on OK, all right, I got the result. But if I want to look at the text, I go to the session folder, I click on this. There’s the same thing which is copied and pasted in our presentation. So we’ll go to the presentation and understand this bonus test value. Here we go. Presentation mode on bonus test. P value is zero 66 two. What is this Levine’s test?
If you assume that the data are normal, you go with the bonus test. If you do not assume that the data are normal, you go with the Levine test. We have already proven that the data are normal, right? Then why am I making the statement that you have to assume that the data are normal? And if you assume that the data are normal, you go with the bonus test. Why am I making that statement that I’m assuming? Think about this population in sample. I am working on the sample. Mind you, I have access only to sample, right? So I’m working on the small sample and I found the data are normal. If the data are normal for the sample, then is it normal for the population? You really don’t know, right? But you’re assuming that the inferences that you make about the sample would be applicable to your population as well. Hence, I’m saying assume. So if you assume that the data are normal, you go with the bonus p value. If data are not normal, if that is what you assume, you go with the Levines test. Anyways, let us look into this p value.
P value here is 0. 662. Is this p value greater than 0. 05? Oh yeah, it is greater than 0. 5. P hi null fly. So you accept null hypothesis which says both the variances are equal and let us go back to the flowchart and see what happens. If the variances are equal. If the variances are equal, you’re going to perform or compare the means using two sample t test. Assuming equal variances, you’re assuming equal variances. Now let me go there. Let me go there. Let me go to the third step. All right. Stare here also. So first step first, you need to write down the null and alternate hypothesis first. So you’re saying that you’re comparing means using two sample t test. You’re saying mu of process A is equal to mu of process B. Alternate hypothesis says both are not equal. Mu of A is not equal to mu of B. Right? This is a minitab navigation which you can perform. All right, let us go back and make the corrections. Or let us do that. Let me make a small correction here. It is two sample theaters, by the way. All right, that’s a small one. Go back to the worksheet.
And how do you navigate? Go to stat basic statistics. Click on two sample ttest. Each sample is in its own column, right? So since I’ve already performed this exercise, it was showing those values here. Otherwise, values would not be populated, right? All right. Sample one is interest rate waiver. That’s the first advertisement type. And then we have the standard promotion advertisement type. I click on options here. And don’t forget to select this option called assume equal variances. That is what we are doing right here. Also, we are assuming that variances are equal. Why am I saying that we are assuming we have already proven that, right? That the variances are equal. Why am I saying it’s assumed? I’m assuming because whatever analysis I’m doing is based on a sample. And then I’m going to draw inferences about the population. So simply click on okay. And okay, here we go. This output is copied and pasted here in the presentation.
So let us move there. Carrity, you go to this option, select Azume. Equal variance is extremely important. And when you click on okay, magic happens using a magic box mini tap, right? And here is a Pvalue. Zero point 24. Is this p value greater than 0. 5? No, it is less than 0. 5. If p is low, null go. That means you’re going to reject the null hypothesis. So you’re simply going to reject this null hypothesis and say that the meaning of the new ad promotion of interest waiver and the amount spent because of the standard Christmas promotion. These both are not the same. That is what you’re concluding here, right? That is what you’re concluding here. And you might ask me a question, but what if I want to find out which promotion has generated more revenue for me? Yeah, that would be discussed in your Six Sigma black belt. Right? We will limit ourselves to few things in Six sigma green belt. In black belt. We’ll look into the entire concept, by the way. All right, this is all about first thing, you might have a few doubts, right? Why do we check whether the data follows normal distribution or not? Now, two sample Ttest assumes that the data follows normal distribution. It assumes hence, we are performing the normality test because we want to know which test we have to perform.
Right? So we carried out Anderson Darling normality test and we found out the p value. And we know that if the p value is greater than 0. 5, we consider that the data are normal. What is Bonnet test for? Bonnet test is used to compare variances of two groups. Right? Here it is. Bonnet test results help us determine whether we should assume equal variances or whether we have to perform test by assuming unequal variances.
What is two sample t test or two independent sample t test? It compares and tests the means of two independent populations. Right? Two independent sample theaters are of two types and we choose one of them depending on the Bonnet test. We either do this test, which says two independent sample t test assuming equal variances, or we perform two independent sample t test assuming unequal variances. All right. Do you still have challenges with hypothesis testing? Do not worry, we still have Team of case studies left, things would become extremely clear.