Hypothesis testing is a procedure for evaluating the likelihood that a given dataset corresponds to a particular “null” model. Formally, all hypothesis testing scenarios can be described as follows.

Let , for be a random variable, where each is sampled independently and identically according to some distribution . Realizations of are . We further assume that , that is, is a distribution that lives in a model . Moreover, we partition into two complementary sets, and , such that and . Given these definitions, all hypothesis tests can be written as:

\begin{align} H_0: F \in \mathcal{F}_0, \qquad H_A: F \in \mathcal{F}_A. \end{align}

Note that a hypothesis can be either simple or composite: according to wikipedia, simple hypothesis tests are “any hypothesis which specifies the population distribution completely,” whereas composite tests are “any hypothesis which does not specify the population distribution completely.” Given this, we consider four different kinds of hypothesis tests:

  1. Simple-Null, Simple-Alternate (S/S) \begin{align} H_0: F = F_0, \qquad H_A: F = F_A. \end{align}

  2. Simple-Null, Composite-Alternate (S/C) \begin{align} H_0: F = F_0, \qquad H_A: F \in \mathcal{F}_A. \end{align}

  3. Composite-Null, Simple-Alternate (C/S) \begin{align} H_0: F \in \mathcal{F}_0, \qquad H_A: F \in \mathcal{F}_A. \end{align}

  4. Composite-Null, Composite-Alternate (C/C): \begin{align} H_0: F \in \mathcal{F}_0, \qquad H_A: F \in \mathcal{F}_A. \end{align}

Each of the above can be re-written in terms of parameters. Specifically, we can make the following substitutions:

\begin{align} F = F_j \Rightarrow \theta(F) = \theta(F_j), \qquad F \in \mathcal{F}_j \Rightarrow \theta(F) \in \Theta_j, \end{align} where .

Note that composite tests (those where the null or alternate is composite) can be both one-sided or two-sided. In particular, the simple null hypothesis can be restated: . Given that, a one-sided simple/composite test is: \begin{align} H_0: F - F_0 = 0, \qquad H_A: F - F_0 > 0. \end{align} Of course, we could just as easily replace the in the alternate with .
The two-sided composite test can be written: \begin{align} H_0: F - F_0 = 0, \qquad H_A: F - F_0 \neq 0, \end{align} or equivalently, \begin{align} H_0: F = F_0, \qquad H_A: F \neq F_0. \end{align}

In other words, the simple null, two-sided composite is equivalent to a test for equality (see below).

Thus, there are essentially three kinds of composite hypotheses:

  1. one-sided,
  2. two-sided, or
  3. neither (e.g., the alternative is a more complex set).

Examples

Independence Testing

An independence test is one of the fundamental tests in statistics. In this case, let . Now we test: \begin{align} H_0: F_{X,Y} = F_{X} F_{Y} \qquad H_A: F_{X,Y} \neq F_{X} F_{Y}. \end{align}

In other words, we define as the set of joint distributions on such that the joint equals the product of the marginals: , and we define as the complement of that set, . Independence tests are special cases of C/C tests.

Two-Sample Testing

Let , for be a random variable, where each is sampled independently and identically according to some distribution . Realizations of are .

Additionally, let , for be a random variable, where each is sampled independently and identically according to some distribution . Realizations of are .

Given these definitions, all two-sample testing can be written as:

\begin{align} H_0: F_1 = F_2 \qquad H_A: F_1 \neq F_2. \end{align}

This can be written as an independence test. First, define the mixture distribution , where and . Now, sample for times. To make it exactly equal to the above, set . Moreover, define to be the latent “class label”, that is if and if . Now, we can form the independence test:

\begin{align} H_0: F_{UV} = F_U F_V \qquad H_A: F_{UV} \neq F_U F_V. \end{align}

Moreover, two-sample tests can also be written as simple goodness-of-fit tests, which is readily apparent, as described below. Thus, simple goodness-of-fit tests are also independence tests.

Goodness-of-Fit Testing

The most general kind of a goodness-of-fit test is a C/C test: \begin{align} H_0: F \in \mathcal{F}_0 \qquad H_A: F \notin \mathcal{F}_0. \end{align} In other words, .

The S/C goodness-of-fit test is a special case: \begin{align} H_0: F = F_0 \qquad H_A: F \neq F_0. \end{align}

This special case is clearly an instance of a two-sample test; the only difference is that the second distribution is not estimated from the data, but rather, is provided by the null.

K-Sample Tests

More generally, assume there exists different distributions, , the K-Sample test is:

\begin{align} H_0: F_1 = F_2 = \cdots = F_K \qquad H_A: \text{any not equal}. \end{align}

We can do the same trick as above, defining to be a mixture of distributions, and denotes from which component is sample . Thus, -sample tests are also independence tests.

Paired K-Sample Testing

Let and be defined as above, except now we sample matched pairs, . The paired two sample test is still: \begin{align} H_0: F = G \qquad H_A: F \neq G, \end{align} but there exists more powerful test statistics that consider the “matchedness”. Of course, this also extends to the -sample setting.

Note that this is a special case of -sample testing, and thus is also an independence test.

Test for Symmetry

Assume we desire to determine whether the distribution of is symmetric about zero. Define as the half of the distribution that is positive, and as the half that is negative. Then, we simply have a typical two-sample test: \begin{align} H_0: F_+ = -F_- \qquad H_A: F_+ \neq -F_-, \end{align} which is an independence test.

Multinomial Test

Assume Multinomial, and let: \begin{align} H_0: p_1 = p_2 = p_3 \qquad H_A: \text{not all equal}. \end{align}

This is again a sample test, which is an independence test.

Considering a slightly different case: \begin{align} H_0: p_1 = p_2 \qquad H_A: p_1 \neq n_2. \end{align} This is a slight generalization of the above, but can still be written as a sample test, and is therefore an independence test.

One Sided Test

Assume , and consider the following test: \begin{align} H_0: \mu = 0 \qquad H_A: \mu \neq 0. \end{align}

This is a two-sample test, and therefore, an independence test. A slightly more complicated variant is: \begin{align} H_0: \mu < 0 \qquad H_A: \mu \geq 0. \end{align}

Viewing this as an independence test is a bit more complicated. In particular, if we set it up as the above two-sample test, and we reject, it could be because or because . However, in practice, it is difficult to address. Letting our test statistic be , we consider four scenarios:

  1. If , and we reject, then we know that is significantly below , and therefore, we should not reject the real null.
  2. If and we fail to reject, then we have failed to reject our original hypothesis, and we are ok.
  3. If , and we reject, meaning that is significantly above , and therefore we are safe to reject the null.
  4. If , and we fail to reject, that means is not significantly bigger than zero, but we cannot reject the original null.

Thus, we can use the test for whether , probably at a fairly severe reduction in power, but nonetheless, maintain a valid test.