# Is the widespread usage of two-sided tests a result of a usability/presentation issue?

*Author: Georgi Z. Georgiev, Published: Aug 6, 2018*

Since I was born in the age of easy access to abundant computing power until recently I did not fully grasp how much statistical applications – both applied and scientific, depended on pre-calculated tables of critical boundaries and probabilities. When I realized the extent to which this was true in a previous era, it downed on me that this might be a plausible factor in the widespread adoption of two-sided calculations despite them being unsuited / inappropriate for supporting most claims made by both applied and scientific researchers.

In short, **I believe that a significant reason for the
preference for two-sided tests over one-sided ones is how the famous Fisher
tables of the T-distribution, Z-distribution and X ^{2} distribution
were tabulated and presented**.

In order to support this claim I will first demonstrate the
importance and ubiquitous use of Fisher’s tables, presented for the first time
in his 1925 book "Statistical Methods for Research Workers" ^{[1]}. Then I will explain why I
believe there is a major flow in their presentation and how it was detrimental
to and discouraged usage of one-sided tests in the early days of statistics.
This then carried on to present day where computing power makes usage of such
tables redundant.

## The Importance of Fisher’s Tables

The fact that the tables of critical values and probabilities for different distributions that Fisher included in his book are of importance is rarely debated, but I believe stressing this point cannot be done enough, so I will lay some facts and citations on the matter by reputable sources to make the point more vivid.

Writing about Fisher’s "Statistical Methods for Research
Workers" Vyas & Desai (2015) state ^{[2]}: "Here, Fisher included tables that gave the
value of the random variable for specially selected values of p that were much
more compact than Pearson’s detailed tables. An unexpected reason for a new way
of tabling is suggested by Jack Good: ‘Kendall mentioned that Fisher produced
the tables of significance levels to save space and to avoid copyright problems
with Karl Pearson, whom he disliked’. […] According to Conniffe, this work
‘went through many editions and motivated and influenced the practical use of
statistics in many fields of study’. Today it is considered the gold standard
of applied statistics for scientists in many fields. It seems that Fisher did
as much to popularize the use of statistics in other disciplines as he did to
contribute to its own development."

**How important statistical tables were at the time** is apparent by a
report in Kendell’s "Ronald Aylmer Fisher, 1890-1962" (1963) ^{[3]} on the dispute
between K.Pearson and Fisher over the reproduction of Pearson’s chi-squared
tables that Fisher requested to be included in his book. "This was perhaps not
simply a personal matter because the hard struggle which Pearson had for long
experienced in obtaining funds for printing and publishing statistical tables
had made him most unwilling to grant anyone permission to reproduce. He was
afraid of the effect on sales of his Tables for Statisticians and Biometricians
on which he relied to secure money for further table publications. It seems,
however, to have been this refusal which first directed Fisher’s thoughts
towards the alternative form of tabulation with quantiles as argument, a form
he subsequently adopted for all his tables and which has become common
practice."

To get a sense of how much of a work that was, it has to be
said that according to Box (1981) ^{[4]} it **took several years and the usage of two
mechanical calculators to compute the tables with a satisfying level of
precision**. For a full understanding of the monumental work that tabulating the
values for the t distribution and others was, I recommend a reading of Box (1981)
"Gosset, Fisher, and the t Distribution".

Yet another testament for that can be found in Krishnan (1997) ^{[5]}
- Fishers Contributions to Statistics: "These tables, together with those by
Pearson and Hartley, were **essential tools** of a statistician's trade in those
days when a statistical laboratory consisted of manually or electrically
operated calculating machines and even in the days of electronic desk calculators." (emphasis mine).

## How the tables are presented in the book

Again, according to Box, the tables published by Fisher represented not only something with wide applications, but also a significant improvement over existing ones in terms of presentation, percentiles being presented not only made the tables shorter, but much more readable and easy to use by practitioners.

Yet, likely for reasons of taking less space, the tables presented only two-tailed probabilities for the corresponding values of the X, t and z statistic.

Starting from Table I: "The deviation in the normal distribution in terms of the standard deviation" we see that the values of X are given as positive numbers. So, when observing X of ~1.64 one adds the column header: .10 to the row header: 00 and finds out P=0.1. When observing X of ~-1.64 one has to first take the absolute value, find that in the column and find the same P. Only in the explanatory text below the table is it mentioned that "x is the deviation such that the probability of an observation falling outside the range from -x to +x is P". This corresponds to a test of a point null and a two-sided alternative hypothesis.

Scans of the table are available online. Here is a partial reconstruction of the table to illustrate what Table I: "Table of x" looked like:

.01. | .02. | ... | .08. | .09. | .10. | |
---|---|---|---|---|---|---|

.00 | 2.575 | 2.326 | ... | 1.750 | 1.695 | 1.644 |

.10 | 1.598 | 1.554 | ... | 1.340 | 1.310 | 1.281 |

... | ... | ... | ... | ... | ... | ... |

0.70 | .371 | .358 | ... | .279 | .266 | .253 |

0.80 | .240 | .227 | ... | .150 | .138 | .125 |

0.90 | .113 | .100 | ... | .025 | .012 | .0 |

The explanatory text below the table in full: "The value of P for each entry is found by adding the column heading to the value in the left-hand margin. The corresponding value of x is the deviation such that the probability of an observation falling outside the range from -x to +10 is P. For example, P = .03 for x = 2.170090; so that 3 per cent of normally distributed values will have positive or negative deviations exceeding the standard deviation in the ratio 2.170090 at least."

There is **no explicit guidance** on how one is to treat x if they
are interested in the probability of an observation falling in the range from 0
to +x or 0 to -x (the two most commonly used one-sided nulls) or any other
range, for that matter.

Then we have a Table IV: "Table of t" wherein we are again
**given only the positive values of t** with no guidance that the probabilities
listed are those for a two-sided test, nor guidance in how to convert them for
at least the simple one-sided case.

Such guidance is offered in a paragraph inside the book, some
**40 pages before the table**, and it states: "If it is proposed to consider the
chance of exceeding the given values of t, in a positive (or negative)
direction only, then the values of P should be halved." This is a clear
statement on how to handle one-sided questions, but it is, unfortunately,
buried many pages into a lengthy and complicated book.

Scans of the table are available online. Here is a partial reconstruction of the table to illustrate what Table IV: "Table of t" looked like:

n. | 9. | 8. | ... | 0.05 | 0.02 | 0.01 |
---|---|---|---|---|---|---|

1 | .158 | .325 | ... | 12.706 | 31.821 | 63.657 |

2 | .142 | .289 | ... | 4.303 | 6.959 | 9.925 |

... | ... | ... | ... | ... | ... | ... |

29 | .127 | .256 | ... | 2.045 | 2.462 | 2.756 |

30 | .127 | .256 | ... | 2.042 | 2.457 | 2.750 |

∞ | .127 | .256 | ... | 1.95996 | 2.32634 | 2.57582 |

Similar issues are present in tables V.A and V.B (correlation coefficients) and Table VI on the values of the z distribution.

Given that pages with the tables were often ripped off the
book and kept at a hand’s distance for reference it becomes painfully clear **how
easy it is to misuse them in favor of using two-sided significance calculations
even when such are not warranted or wanted**.

## How the presentation favors two-sided significance calculations

It should already be easy for a person with experience in graphical design and user experience to spot why the tables as they are presented would favor two-sided calculations, even if said person has minimal statistical knowledge. Let us make this explicit.

**1.** Aside from table I, there is **no relevant information** on
the page on which the table is printed **showing that the P-values are calculated
for a null of zero difference**. This makes it more likely for one to remain **unaware**
of that fact and makes it easier for people wise to it to **forget** it.

**2.** The explanation on how to treat the one-sided case is
**buried deep in the book** and only references one of the several tables. It is
therefore likely that many of the users of the tables would be **unaware** of it
and if in need of such guidance they should consult other sources (books,
papers, colleauges, etc.).

Imagine for a moment that you are a research worker with deep expertise in your field, but only cursory understanding of a statistical procedure or two which are relevant to the daily tasks you work with. You, unlike the likely reader of this article, have not had the joy of familiarizing yourself with the deep mathematical and philosophical roots of the procedures you apply. How easy it is then to interpret the probabilities as applicable to any null hypothesis you have at hand?

As demonstrated elsewhere ("Examples of improper use of two-sided hypotheses") many researchers do not understand the need to define a statistical null hypothesis that corresponds to their research hypothesis and threat any statistical answer as if it applies to their research hypothesis. As seen, making a directional claim can be quite subtle at times and given the practice of simply attaching a p-value to a number without explicitly specifying the null hypothesis under which it was calculated it is no wonder that we see so many p-values detached from the conclusions they are meant to support.

The above can be shrugged off as conjecture, ex-post-facto
explanation, but I believe **simple empirical tests** can and will confirm this. A
way to test this would be to introduce the tables to students who are familiar
with what a t or z distribution is, and then to ask them to apply it to a
couple of different problems requiring the use of one-tailed tests to see if
they will extract the correct p-value from the tables. To make the test more
sensitive, it would be recommended to include at least one experiment in which
the observed value of the test statistic has a negative value, e.g. z=-1.65.

A control group can use different, one-sided tables. A great
example are the tables provided in the US Environmental Protection Agency: "Data
Quality Assessment: Statistical Methods for Practitioners" ^{[6]} guide, for example
the Z table (A-1) is split into to sections, one for negative values of z and
another for positive. The p-values provided are one-sided. Above each section
is a graphical representation of the rejection region. Now to do a two-sided
calculation you need to add up the p-values from the two tables: for -|z| and
for |z|, to get the two-sided p-value.

## Current situation and conclusions

Given the usage of one-tailed probabilities by Fisher in many of his examples of applications of statistical methods I do not believe that the reporting of two-tailed probabilities in the tables was any kind of deliberate attempt to discourage usage of one-tailed probabilities. It was most likely a convenient or economical decision, or just lack of oversight for the potential consequences.

Similarly, I do not think that nowadays major statistical software vendors are making it deliberately harder to get one-tailed probabilities out of their tools by adopting two-tailed probabilities as defaults. I think it is part custom/convention and part poor UI that does little to help the researcher decide on what is the best statistical test for their research hypothesis or question at hand. The unintended consequence is bias against the usage of one-sided tests and the proliferation of suboptimal matching between research and statistical hypothesis, resulting in incorrect error probabilities being reported.

In conclusion, I find strong reason to believe that lacking user experience of early statisticians and research workers in using the tables provided by Fisher in his "Statistics for Research Workers" book is at least partly responsible for present day misconceptions about one-sided tests as well as the usage of two-sided tests for providing error probabilities to the answers of questions that require the use of one-sided tests. Newer software tools continue to suffer from similar issues, propagating the confusion further into the future.

P.S. In writing the above, I also realized that in a time
when these tables where the main tool of the statistician it is not so
difficult for one to forget there are null hypotheses different than the nil
hypothesis. The step from that to equating the two is small, indeed. **Similar,
but less pronounced effects are likely present due to the default choices of present
day statistical software**.

#### Reference

[1] Fisher R.A. (1925) "Statistical methods for research workers". Oliver & Boyd, Edinburg

[2] Vyas S.A., Desai S.P. (2015) "The Professor and the Student, Sir Ronald Aylmer Fisher (1890-1962) and William Sealy Gosset (1876-1937): Careers of two giants in mathematical statistics.", *Journal of medical biography* 23(2):98-107; https://doi.org/10.1177/0967772013479482

[3] Kendall M.G. (1963) "Ronald Aylmer Fisher, 1890–1962", *Biometrika* 50(1-2):1-15; https://doi.org/10.1093/biomet/50.1-2.1

[4] Box J.F. (1981) "Gosset, Fisher, and the t Distribution", *The American Statistician* 35(2):61-66; http://dx.doi.org/10.1080/00031305.1981.10479309

[5] Krishnan T. (1997) "Fishers Contributions to Statistics", *Resonance Journal of Science Education* 2(9):32-37

[6] US Environmental Protection Agency (EPA) "Data quality assessment: statistical methods for practitioners", EPA QA/G-9S, issued Feb 2006

**Enjoyed this article? Please, consider sharing it where it will be appreciated!**

#### Cite this article:

If you'd like to cite this online article you can use the following citation:

Georgiev G.Z., *"Is the widespread usage of two-sided tests a result of a usability/presentation issue?"*, [online] Available at: https://www.onesided.org/articles/widespread-usage-of-two-sided-tests-result-of-usability-issue.php URL [Accessed Date: 18 Jan, 2020].