Testing for the Mean (Small Sample)

8.3 Statistical Test for Population Mean (Small Sample)

In this section wil ladjust our statistical test for the population mean to apply to small sample situations. Fortunately (sic!), this will be easy (in fact, once you understand one statistical test, additional tests are easy since they all follow a similar procedure.

The only difference in performing a "small sample" statistical test for the mean as opposed to a "large sample test" is that we do not use the normal distribution as prescribed by the Central Limit theorem, but instead a more conservative distribution called the T-Distribution. The Central Limit theorem applies best when sample sizes are large so that we need to make some adjustment in computing probabilities for small sample sizes. The appropriate function in Excel is the TDIST function, defined as follows:
TDIST(T, N-1, TAILS), where

Tis the value for which we want to compute the probability
N is the sample size (and N-1 is frequently called the "degrees of freedom")
TAILS is either 1 (for a 1-tail test) or 2 (for a 2-tail test). Since we again consider 2-tailed tests only we always use 2

With that new Excel function our test procedure for a sample mean, small sample size, is as follows:

Statistical Test for the Mean (small sample size N < 30):
Fix an error level you are comfortable with (something like 10%, 5%, or 1% is most common). Denote that "comfortable error level" by the letter "A". Then setup the test as follows:

Null Hypothesis H₀:
mean = M, i.e. The mean is a known number M
Alternative Hypothesis H_a:
mean ≠ M, i.e. mean is different from M (2-tailed test)

Test Statistics:
Select a random sample of size N, compute its sample mean X and the standard deviation S. Then compute the corresponding t-score as follows:
T= (X - M) / ( S / sqrt(N) )

Rejection Region (Conclusion)

Compute p = 2*(1 - P(t > |T|)) = TDIST(ABS(T), N-1, 2)

If the probability p computed in the above step is less than A (the error level you were comfortable with inititially, you reject the null hypothesis H₀ and accept the alternative hypothesis. Otherwise you declare your test inconclusive.

Comments:

The null and alternative hypothesis for this test are the same as before

The calculation of the test statistics is the same as before, but the result is called T instead of Z (oh well -:)

The TDIST function is similar to the NORMSDIST function, but it does not work for negative values of T (a limitation of Excel), and it automatically gives a "tail" probability. Thus, the computation of the p-value had to be adjusted accordingly.

The ABS function in the above formulas stands for the "absolute value" function. (In other words, just drop any minus signs ... -:)

Example 1: A group of secondary education student teachers were given 2 1/2 days of training in interpersonal communication group work. The effect of such a training session on the dogmatic nature of the student teachers was measured y the difference of scores on the "Rokeach Dogmatism test given before and after the training session. The difference "post minus pre score" was recorded as follows:

-16, -5, 4, 19, -40, -16, -29, 15, -2, 0, 5, -23, -3, 16, -8, 9, -14, -33, -64, -33
Can we conclude from this evidence that the training session makes student teachers less dogmatic (at the 5% level of significance) ?
This is of course the same example as before, where we incorrectly used the normal distribution to compute the probability in the last step. This time, we will do it correctly, which is fortunately almost identical to the previous case (except that we use TDIST instead of NORMDIST):

Null Hypothesis: there is no difference in dogmatism, i.e. mean = 0

Alternative Hypothesis: dogmatism is different, i.e. mean not equal to 0

Test statistics: sample mean = -10.9, standard deviation = 21.33, sample size = 20. Compute
T = (-10.9 - 0) / (21.33 / sqrt(20) ) = -2.28

Rejection Region: We use Excel to compute p = TDIST(2.28, 19, 2) = 0.034, or 3.4%. That probability is less than 0.05 so we reject the null hypothesis.

Note that in the previous section we (incorrectly) computed the probability p to be 2.2%, now it is 3.4%. The difference is small, but can be significant in special situations. Thus, to be safe:

if N > 30 use the Z-Test based on the standard normal distribution NORMSDIST as in the previous section

if N < 30 use the T-Test based on the T-Distribution TDIST as in this section

Example 2: Suppose GAP, the clothing store, wants to introduce their line of clothing for women to another country. But their clothing sizes are based on the assumption that the average size of a woman is 162 cm. To determine whether they can simply ship the clothes to the new country they select 5 women at random in the target country and determine their heights as follows:

149, 165, 150, 158, 153

Should they adjust their line of clothing or they ship them without change? Make sure to decide at the 0.05-level.

By now statistical testing is second-nature (I hope -:)

Null Hypothesis: mean height in new country is the same as in old country, i.e. M = 162

Alt. Hypothesis: mean height in new country is different from old country, i..e. M not equal to 162 (either too small or too tall would be bad for GAP)

Test Statistics: we can compute the sample mean = 155 and the sample standard deviation = 6.59 while the sample size is clearly N = 5. Therefore

T = (155 - 162) / ( 6.59 / sqrt(5) ) = -2.37

Rejection Region: We use Excel to compute p = TDIST(2.37, 4, 2) = 0.077. Thus, if we did decide to reject the null hypothesis the probability of that decision being wrong is 7.7%. That is larger than 0.05, thus we declare the test inconclusive.

Note that our test is inconclusive, which does not mean that we accept the null hypothesis. Thus, we don't recommend anything to GAP (since our test came out inconclusive). Using common sense, however, we recommend to GAP to conduct a new study but this time select a random sample of (much) larger size, something like 100 or more. Hopefully the new study will provide statistically significant evidence.

If you compare the T values in example 1 and 2 you see that they are very similar. Yet, the associated probabilities are very different. That is due to the fact that in example 1 we used a 1-tail test with a relatively high degree of freedom while in example 2 we had a 2-tail test with a very small degree of freedom. Both factors make the probabilities so different.