Examples of Negative Portrayals of One-Sided Significance Tests

Author: Georgi Z. Georgiev, Published: Aug 6, 2018

In this article I’ve collected a decent amount of portrayals of one-sided tests of significance or one-sided confidence intervals as something bad, biased, error-prone, that should never be used, etc. These are freely available sources and mostly exclude textbooks on statistics as I do not want to commit the budget required to purchase all the different textbooks out there to examine them. If you own one and you see that it gives one-sided tests a bad name, drop me an email with the quote, making sure to cite the edition and the page on which it is found and I will consider adding it in here.

These examples present different arguments against the use of one-sided hypotheses in statistical tests. While there is a fair amount of repetition, I believe that including as many of these as possible would better illustrate the extent of the issue of misrepresentation of one-sided tests. My brief commentary for each entry should be of some educational value as well.

I. Negative portrayals in published papers in scientific journals

1. Lombardi & Hurlbert

Probably the most comprehensive paper on the topic was published by Lombardi & Hurlbert (2009) [1]. Titled "Misrepresentation and misuse of one-tailed tests" the paper's conclusion states:

"… Our analysis, in contrast, supports those few authors who have argued that prediction is never a valid justification for use of one-tailed tests. The claim that there is ‘interest only’ in results in a particular direction should be acceptable only if exceptional circumstances make it clear that the investigator truly would have been willing to disregard results strongly in the direction supposedly ‘not of any interest’ and only if such a contrary result would have been of no interest to science or society as a whole."

While I agree that prediction is not a valid justification for the use of one-sided test, I claim that it is also entirely unneeded. There is no need for any exceptional circumstances for the investigator to be justified in asking a directional question or in making a directional statement. The paper incorrectly interprets what "interested in" means in a major work by J.Neyman (discussed by me in Fisher, Neyman & Pearson - advocates for one-sided tests and confidence intervals) and argues for a nebulous "collective interest criterion" which does not really have any bearing on the use of one-sided tests. The authors fail to make the distinction between the outcome space and the sample space and this leads to the illogical conclusion of the paper.

2. Goodman

Goodman (1988) [2] is another example of mistaking the null hypothesis with the nil hypothesis and possibly following from this a failure to understand how a less precise null can be overthrown with less uncertainty given the same set of data:

"Investigators who do one-sided sample size calculations should simply be told that by standards of evidence, they cannot inflate the apparent power of the experiment by relabeling the Z scores with lower p values. Or, to crudely paraphrase Gertrude Stein, a Z = 1.7 by any other name would be as weak."

From a false premise, a wrong conclusion naturally follows: "…if we are going to report p values as a summary of the data, they should always be two-sided". A p-value should be reported in the context of the null under which it was generated, otherwise it becomes uninterpretable. Whether it makes sense to also report a "standard" p-value for a null that is rarely of practical or scientific interest is a topic for another discussion.

3. Hick

Hick (1952) [3] writes that "It is far better to carry out the statistical work in the ordinary way, without tampering with the critical region, and let prior probability (or any other relevant consideration) determine what level of significance you will regard as decisive.

You can still believe in a positive deviation more readily than a negative one, if you like; but you do not confuse the issue for others who may have different ideas."

This is I believe a conflation of the null hypothesis with a nil hypothesis. When making the null hypothesis for which the p-value was calculated explicit, there should be no issue for the reader, especially given that they are likely to interpret your p-value as supporting an effect in the direction of the observed direction anyways. Adjusting the critical region so that it corresponds to the question asked is by no means deserving of the "tampering" qualification.

As already said, one can always report a p-value under the nil hypothesis alongside that of the null hypothesis of interest, if deemed necessary.

4. Burke

In Burke (1953) [4] we read the following conclusion and recommendation: "We counsel anyone who contemplates a one-tailed test to ask of himself (before the data are gathered): "If my results are in the wrong direction and significant at the one billionth of 1 per cent level, can I publicly defend the proposition that this is evidence of no difference?" If the answer is affirmative we shall not impugn his accuracy in choosing a one-tailed test. We may, however, question his scientific wisdom."

Here Birke does not allow for inference about a one-sided hypothesis in the direction of the observed effect so according to him one cannot ask a question opposite to the one they initially considered of primary interest when designing the study. I do not see why this has to be the case and why the data should not be put to the best use possible. After all, neither the performance a one-sided test in the opposite of the expected direction, nor the reporting of the resulting value have any effect on the sample space. Given a different outcome space the resulting probability or type I error threshold that is reported will remain accurate.

It is a different matter entirely how the researcher will interpret the surprising data in light of the available prior information on the topic, but I assume that it will not result in the same paper and conclusions that a result in the expected direction would have produced.

The advice by Burke also mistakenly transforms a one-sided claim of difference in a particular direction to one of "no difference". What the scientist ought to do in the case is report a one-sided p-value of 1 billionth of 1 per cent against the null hypothesis for the direction opposite to the observed effect. If it is deemed relevant, the 2 billionth of 1 per cent p-value under the point null hypothesis (nil hypothesis) can also be stated, fully negating Birke’s critique.

5. Kimmel

Kimmel 1957 [5] proposes three criteria for the appropriateness of a one-tailed tests in psychological research: "1. Use a one-tailed test when a difference in the unpredicted direction, while possible, would be psychologically meaningless.", "2. Use a one-tailed test when results in the unpredicted direction will, under no conditions, be used to determine a course of behavior different in any way from that determined by no difference at all.", and "3. Use a one-tailed test when a directional hypothesis is deducible from psychological theory but results in the opposite direction are not deducible from coexisting psychological theory. If results in the opposite direction are explainable in terms of the constructs of existing theory, no matter how divergent from the experimenter's theoretical orientation this theory may be, the statistical hypothesis must be stated in a way that permits evaluation of opposite results."

I do not know what it means for a difference in the unpredicted direction to be "psychologically meaningless" and I do not find Kimmel’s example satisfactory. If nothing else, an unexpected result can be grounds for future research focused on explaining them.

The other two conditions only make sense assuming a single statistical hypothesis can be examined for a single set of data, which is, of course, not a good way to make use of scarce resources like experimental data. One can always make a statistical test for a direction different than the one predicted, without incurring any statistical penalties. Stating a one-sided research hypothesis in one direction in, say, a preregistration, does not prohibit one from performing a statistical analysis on a one-sided alternative in the opposite direction, hence #3 is redundant as evaluation of opposite results is always permitted. The fact that the stated research hypothesis might need to be revised in light of data is a different question not concerning the statistical hypothesis. In my opinion stating a non-directional research hypothesis or question of interest is, in fact, a cop-out tactic: no matter what the data says, you’d almost always be right, unless the nil hypothesis is in fact, correct, which is rarely the case.

6. Bland & Altman

Bland & Altman 1994 [6] based on the example for appropriate use of a one-sided test basically state that such tests are appropriate if the true value cannot deviate in one of the directions. The say that a two-sided test in "inappropriate" in such case, but I would say that it is mathematically impossible under a default null of no difference.

In the same paper we read: "In general a one sided test is appropriate when a large difference in one direction would lead to the same action as no difference at all.", however this is not a good justification for conducting a one-sided test. A one-sided test is appropriate when the action taken depends on the direction of the result. Therefore, if the hypothesis of negative difference or no difference is ruled out, it would lead to a particular action or conclusion. If a positive difference or no difference is ruled out, this would lead to another action or conclusion instead, as is almost always the case.

Finally, we read that "Two sided tests should be used unless there is a very good reason for doing otherwise. If one sided tests are to be used the direction of the test must be specified in advance. One sided tests should never be used simply as a device to make a conventionally non-significant difference significant." If by "good reason" one understands "there is a directional question at hand", then I’d agree, but I have significant doubts this is what the authors meant. One can do a one-sided test without specifying it in advance without any compromise to the integrity of the results obtained.

There is no way to make a "non-significant difference significant" by use of one-sided test since a one-sided test corresponds to a different, broader question, compared to a two-sided test answer a much more specific one. Stating a probability related to one question has no bearing on the probability related to a different question and thus cannot possibly make it significant.

7. L.D. Fisher

In Fisher L.D. (1991) [7] "The use of one-sided tests in drug trials: an FDA advisory committee member's perspective." we read:

"The use of one-sided or two-sided tests in drug trials to evaluate new compounds is considered. For drugs that may be tested against placebos, with two positive trials required (as in the United States), it is argued that from both a regulatory and pharmaceutical industry perspective, one-sided tests at the 0.05 significance level are appropriate. In situations where only one trial against placebo may be done (for example, survival trials), one-sided tests at the 0.025 level are appropriate in many cases. For active control trials it is argued that two-sided tests are usually appropriate."

It seems like the author is making the statements from the position that prior knowledge or prediction of the effect is required in order to conduct one-sided hypothesis tests, while no such requirement is necessary for asking a directional question, which is what a one-sided test is equivalent to. It sounds like the author is under the impression that the prediction has an effect on the sample space.

8. Ruxton & Neuhäuser

The 2010 paper (Ruxton & Neuhäuser, 2010) [8] argues that one-sided tests are only acceptable if two conditions are satisfied: only if the authors can explain why an effect in one direction is more interesting than an effect in the other, and also why they would treat a large observed difference in the unexpected direction as no evidence for rejection of the null hypothesis.

This is yet another example for artificial requirements for one-sided tests, as if directional claims are somehow out of the ordinary scientific practice and not the general standard. In the first requirement I see lack of understanding of the terms "hypothesis" and "claims", while in the second there is a clear conflation of the null hypothesis with the nil hypothesis. None of these are novel issues and both are addressed in articles on OneSided.org.

9. Moyé & Tita

In Moyé & Tita (2002) [9] we see many of the straw mans against one-sided tests repeated. Claims like "Rather than reflecting the investigators’ a priori intuition, the type I error should reflect the uncertainty of the research effort’s future conclusions" betray the assumption that the researcher's intention, prediction or expectation has an effect on the sampling space. Similar sentiments are expressed when talking about researchers beliefs and how they should not affect the choice of a test. Making a claim after an experiment has no relevance to any prior or posterior beliefs. This mistakte naturally turns into an endorsement of two-tailed tests for the wrong reason: "The two-tailed hypothesis test appropriately reasserts the possibility that the investigator's belief system about an effect of therapy might be wrong"

Further into the paper we can see the false conviction that one cannot do a one-sided claim in either direction: "However, this apparent reduction in sample size in a clinical experiment produced by carrying out a one-sided hypothesis test comes at the price of being unable to draw appropriate conclusions if the investigators are wrong and the study demonstrates detrimental effects." There is nothing forbidding the researchers in making a claim of the study demonstrating a harmful effect even if, as is no doubt the case (unless developing military tech!), their expectation was that it would support a positive effect. Not only that, but an ethical researcher is bound to report the result of a one-sided test for the harm alternative and if it is significant at a reasonable risk level, no further tests of the same substance would be performed. No reasonable person would expose another group of patients to a proven (to an extent) harmful substance simply because a group of researchers were hoping that it was beneficial.

II. Bias against one-sided tests of significance in regulatory guidelines and technical recommendations

Since statistical guidelines and recommendations issued by various governmental and non-governmental bodies follow the statistical literature, one should not be surprised to find unfavorable descriptions of one-sided tests or requirements that a one-sided test should be specified, including its direction, during the registration of the experiment. Here I’ll review just a few examples of guidelines with significant importance.

1. FDA

First, the US Food and Drug Administration (FDA) in their "Statistical Guidance on Reporting Results from Studies Evaluating Diagnostic Tests" [10] write: "FDA recommends you report measures of diagnostic accuracy (sensitivity and specificity pairs, positive and negative likelihood ratio pairs) or measures of agreement (percent positive agreement and percent negative agreement) and their two-sided 95 percent confidence intervals."

It does not become clear from the documentation itself why two-sided intervals are required. A directional claim of the form "the observed value falls below the lower boundary of a one-sided 95% CI [0.4, +∞)" should be admissible just as an equivalently appropriate claim for a two-sided 95% CI. The guideline does not contain a firm recommendation for using two-sided tests in general and does not mention one-sided / one-tailed hypotheses.

2. EMA

Its counterpart, the European Medicines Agency (EMA) in their "Statistical Principles for Clinical Trials" [11] write:

"The issue of one-sided or two-sided approaches to inference is controversial and a diversity of views can be found in the statistical literature. The approach of setting type I errors for one-sided tests at half the conventional type I error used in two-sided tests is preferable in regulatory settings. This promotes consistency with the two-sided confidence intervals that are generally appropriate for estimating the possible size of the difference between two treatments."

The type I error in a 95% one-sided interval or one-sided significance calculation is maintained just as well as in their two-sided counterpart. How reporting a 97.5%CI as a 95%CI promotes consistency is beyond me and so is reporting a p-value of 0.02 as 0.04. We see a clear disregard of the fact that a p-value, same as a confidence interval, is uninterpretable without a specified null hypothesis.

III. Negative portrayal in books on statistics

1. "Serious Stats: A guide to advanced statistics for the behavioral sciences"

Thomas Baguley’s 2012 "Serious Stats: A guide to advanced statistics for the behavioral sciences" [12] is an example of all the issues I am trying to tackle. Box 4-1 on page 125, titled "One-sided (directional) versus two-sided (non-directional tests)" starts with:

"The advantage of a direction test is that it tends to produce smaller p values than a non-directional test and is therefore more likely to detect an effect. […] This superficially desirable property brings disadvantages.". This is patently false since "effect" refers to different things in the two types of tests. The two are not comparable (see Myth #6 in "Myths about one-tailed vs. two-tailed tests of significance" for details).

Speaking of one-sided tests: "… because the hypothesis is directional the test itself does not support inferences about the direction of an effect. If the direction of effect is in question, a two-sided, non-directional test is required.". The reverse is true! Only a one-sided test can allow you to make an inference about the direction of an effect. A two-tailed does not, since you would be overstating the uncertainty associated with the directional claim.

We see the boogeyman of switching from two-sided tests to one-sided tests (myth #4), then some nonsense about not being able to construct one-sided confidence intervals or that they somehow defeat the purpose of confidence intervals. It ends with myth #3 by claiming that performing a one-sided test has anything to do with prediction of an effect in a given direction. It ends with "A one-sided test should be employed only if the direction of an effect is already known or if any outcome in the non-predicted direction would be ignored.". The first part defeats the purpose of experimental science, while the second can have a double meaning. If it is meant in the sense that under the claim of interest an outcome in the other direction will not be accepted as rejecting the null, then it is correct. However, based on the context, it should be accepted to mean that the researcher should refrain from reporting or analyzing any unexpected outcomes…

2. "The Clinical Practice of Drug Information"

In Michael Gabay’s 2015 "The Clinical Practice of Drug Information" [13], page 159 in a small chapter on "One-tailed Versus Two-Tailed Tests" we read "There must be a reason to justify looking in only one direction to avoid an inappropriate conclusion. For example, if a non-inferiority study is conducted, there must be some advantage to the new intervention (e.g. decreased side effects, decreased monitoring, easier route of administration, etc.) to justify the one-sided analysis. If a superiority study is conducted, preliminary studies should have been conducted to prove noninferiority or equivalence."

The above is not true as it all depends on the claims being made. What if the superiority study turns out extremely negative results? Are you going to ignore them and pretend the experiment never occurred? Of course you will not embarrass yourself this way: you will report them with a one-sided p-value and/or CI as you should do with any directional claim. There is no need to justify why you are making a directional claim.

3. "Statistical Tests in Medical Research"

In this 2011 book by Jaykaran one can read the following: "Statistical tests can be calculated as one-tailed or two-tailed. [...] Suppose she (the researcher) wants to see the effect of a new antihypertensive drug, then she may have one of the two hypotheses. One, new antihypertensive drug will always decrease the blood pressure or may increase the blood pressure. In the former condition, one-tailed test should be used and in later condition two-tailed test should be used. But the question is how she is so much sure that it will always decrease the blood pressure? This new drug may decrease the blood pressure or may increase the same. One-tailed test should only be used when the hypothesis is based on strong scientific research. It should be used when the researcher is sure that either there will be no effect of intervention or the effect will be in one direction. If there is any confusion it is better to use two-tailed test. In medical research we should always use the two-tailed test so that, any effect observed even in opposite direction can be exploited for further research."

The above is yet another example of the mistaken belief that using a one-sided test requires a prediction of the direction of the effect in order for it to produce a valid statistical inference. Not so as I argue. Advice like the above leads to the widespread misuse of two-tailed tests of significance.

IV. Negative portrayal of one-sided tests in online university resources

1. University of Houston, Texas

The online materials for the course "PH1835 - Statistical Methodology in Clinical Trials Fall 2015" in "School of Public Health, University of Texas, Houston, Texas" by Lem Moyé, M.D., Ph.D. definitely try to condemn one-sided tests as much as possible:

"I believe one-sided (benefit only) testing reflects a mindset of physicians and healthcare researchers who believe their intervention can produce no harm, a philosophy that has been shown repeatedly to be faulty, and dangerous to patients and their families. Investigators who agree to the one-sided (benefit only) approach to significance testing in a clinical experiment have closed parts of their minds to the possibility of harm entering into an avoidable flirtation with danger. In this sense, the one-sided test is not the disease, it is only a symptom."

In short, this is confusing the reduction of the outcome space with the reduction of the sampling space.

2. University of California, Los Angeles

In an F.A.Q. page part of the webpage of the "Institute for Digital Research and Education" of University of California, Los Angeles (UCLA) we can read a lot of misinformation about one-sided tests, such as:

"Because the one-tailed test provides more power to detect an effect, you may be tempted to use a one-tailed test whenever you have a hypothesis about the direction of an effect. Before doing so, consider the consequences of missing an effect in the other direction. Imagine you have developed a new drug that you believe is an improvement over an existing drug. You wish to maximize your ability to detect the improvement, so you opt for a one-tailed test. In doing so, you fail to test for the possibility that the new drug is less effective than the existing drug.  The consequences in this example are extreme, but they illustrate a danger of inappropriate use of a one-tailed test."

A one-sided test does not limit your ability to detect discrepancies in the other direction. If anything, it maximizes your ability to detect discrepancies in both directions.

The do allow for one-sided tests in a non-inferiority trial scenario, similar to EMA’s recommendations. However, then comes this:

"When is a one-tailed test NOT appropriate?

Choosing a one-tailed test for the sole purpose of attaining significance is not appropriate.  Choosing a one-tailed test after running a two-tailed test that failed to reject the null hypothesis is not appropriate, no matter how "close" to significant the two-tailed test was.  Using statistical tests inappropriately can lead to invalid results that are not replicable and highly questionable–a steep price to pay for a significance star in your results table!"

The application of a one-sided test following a two-sided test does not compromise the results in the least, nor does it make them any more questionable than they already are. It just answers as different, less specific questions which it can answer with greater precision (vs. using the same data to answer a more specific questions such as the one posed by a two-sided test).

V. Negative press for one-sided tests in popular online resources

Contextual popularity was assessed by high rankings for relevant searches in Google (the Google ranking system has a significant "popularity/citation" component to it). It should be noted that negative press dominated the results at the time of extraction although I have not made an attempt at numerical estimation in terms of % of results.

1. Wikipedia

In its article on "One and two-tailed tests" the Wikipedia community has converged on the following: "A two-tailed test is appropriate if the estimated value may be more than or less than the reference value, for example, whether a test taker may score above or below the historical average. A one-tailed test is appropriate if the estimated value may depart from the reference value in only one direction, for example, whether a machine produces more than one-percent defective products."

In essence it precludes the usage of a one-tailed test in most, if not all practical applications where we are interested in the difference between two or more values, be it from different experimental groups, observational groups, or baseline measurements. It incorrectly equates the tailedness of the hypothesis with the estimated value and mistakes the restrictions a one-sided hypothesis places on the outcome space with restrictions placed on the sample space.

There are also some wild claims in their article on Null-Hypothesis like "Explicitly reporting a numeric result eliminates a philosophical advantage of a one-tailed test." which in the context is understood as reporting the significance of a two-sided test along the observed difference: "The treatment has an effect, reducing the average length of hospitalization by 1.5 days". The Discussion section of the article is a mess that needs to be cleaned up a.s.a.p.

2. Investopedia

The resource for investors, economists, traders, etc. states in its article on one-tailed tests that: "For this reason, a one-tailed test is only appropriate when it is not important to test the outcome at the other end of a distribution.". This is false since it leaves the reader with the impression that they should not care about departures in one of the possible directions. This is correct only the sense that given the immediate questions at hand (e.g. "is X larger than Y") one treats a departure in the other end of a distribution as equal to no departure at all. It does not preclude one to ask or care about testing the outcome at the other end of a distribution with a one-sided test in the appropriate direction.

3. Mytutor.co.uk resource for tutors

In their article "What is the difference between a one-tailed or two-tailed experimental hypothesis", a part of an A-level Psychology course, we see "A one tailed hypothesis, or directional hypothesis, predicts the actual DIRECTION in which the findings will go. It is more precise, and usually used when other research has been carried out previously, giving us a good idea of which way the results will go eg we predict more or less, an increase or decrease, higher or lower

two-tailed hypothesis, or non-directional hypothesis, predicts an OPEN outcome thus the results can go in 2 directions. It is left very general and is usually used when no other research has been done before thus we do not know what will happen eg we predict a difference, an effect or a change but we do not know in what direction"

There is no need to know, predict, expect, etc. in which direction an outcome will occur. It is only necessary to have a properly formulated question. Again the outcome space vs sample space mistake.

4. Statistical Software vendor StatsDirect

In page on p-values we read: "The only situation in which you should use a one sided P value is when a large change in an unexpected direction would have absolutely no relevance to your study. This situation is unusual; if you are in any doubt then use a two sided P value."

5. Statistical Software vendor GraphPad

In their statistics guides we read:

"A one-tailed test is appropriate when previous data, physical limitations, or common sense tells you that the difference, if any, can only go in one direction. You should only choose a one-tail P value when both of the following are true.

• You predicted which group will have the larger mean (or proportion) before you collected any data. If you only made the "prediction" after seeing the data, don't even think about using a one-tail P value.

• If the other group had ended up with the larger mean – even if it is quite a bit larger – you would have attributed that difference to chance and called the difference 'not statistically significant'."

Again, the mistake of confusing the outcome space with the sample space. There is no need for prior data, predictions, or limitations such that the data can only go in one direction. The second statement does not reflect the proper course of action in seeing a large discrepancy in the opposite direction.

6. Statistical Consultants Analysis Factor

In a blog post we see the statement: "The short answer is: Never use one tailed tests."

Plain and clear, and just as wrong. The justifications given are: "Only a few statistical tests even can have one tail: z tests and t tests." and "Probably because they are rare, reviewers balk at one-tailed tests. They tend to assume that you are trying to artificially boost the power of your test. Theoretically, however, there is nothing wrong with them when the hypothesis and the statistical test are right for them."

The first is mistaking the distribution of the outcome variable with the distribution of the statistic. The result from a Chi-Square or F-test can correspond to either a one-sided or two-sided statistical hypothesis. The fact the distribution has only one tail is irrelevant. Avoiding this confusion is one of the reasons why this site is called onesided.org and not onetailed.org.

The remark about editors being suspicious of one-sided claims is a statement for the editors statistical ignorance, not against one-sided tests. The redeeming factor is in the last sentence from which we learn that "there is nothing wrong with them…". How this statement works next to "never use one tailed tests" is anyone’s guess.

7. StatisticsHowTo Education Resource / Book

In this online education resource accompanying a book on statistics we read: "On the other hand, it would be inappropriate (and perhaps, unethical) to run a one-tailed test for this scenario in the opposite direction (i.e. to show the drug is more effective). This sounds reasonable until you consider there may be certain circumstances where the drug is less effective. If you fail to test for that, your research will be useless.

Consider both directions when deciding if you should run a one tailed test or two. If you can skip one tail and it’s not irresponsible or unethical to do so, then you can run a one-tailed test."

8. Statistician Jerry Dallal, PhD

In his page on one sided tests: "What damns one-tailed tests in the eyes of most statisticians is the demand that all differences in the unexpected direction--large and small--be treated as simply nonsignificant. […] It is surprising to see one-sided tests still being used in the 21-st century, even in a journal as renowned as the Journal of the American Medical Association. […] Marvin Zelen dismisses one-sided tests in another way--he finds them unethical! His argument is as simple as it is elegant. Put in terms of comparing a new treatment to standard, anyone who insists on a one-tailed test is saying the new treatment cannot do worse than the standard. If the new treament has any effect, it can only do better. However, if that's the case right at the start of the study, then it is unethical not to give the new treatment to everyone!"

There is nothing damning in asking a more exact question and getting a correspondingly exact answer and thus it should be surprising that we do not see more one-sided tests in the 21-st century, not that we are seeing them at all. The cited argument by Marvin Zelen is the well-known mistake of confusing the restriction on the outcome space with a restriction on the sample space.

9. Alexander Etz, PhD

In his post on one-sided tests he states: "I have defied the rules of logic. I have concluded the stronger proposition, probability of heads > ½, but I cannot conclude the weaker proposition, probability of heads > ½ or < ½. As Royall (1997, p. 77) would say, if the evidence justifies the conclusion that the probability of heads is greater than ½ then surely it justifies the weaker conclusion that the probability of heads is either > ½ or < ½."

This is a puzzling one, as it fails to grasp that refuting that a parameter does not lie in either the left or the right tail (plus zero) is in fact a stronger position, not a weaker position, compared to refuting the claim that the parameter does not lie in just one of the tails.

To explain this through a metaphor, it is a weaker statement to state: "I do not weigh less than or equal to 80kg" (rejecting a null of weight ≤ 80kg) and it is a stronger statement to state: "I weigh 80kg" (rejecting both weight < 80 kg and weight > 80kg). The first statement excludes only weights less than 80kg while leaving open the opportunity that I weigh 81, 82, 83kg, etc. while the second one also excludes weights larger than 80kg. If the citation from Royall is correct, then Royall made a logical mistake in this statement, hence the apparent paradox. This is explained more fully in my article The paradox of one-sided vs. two-sided tests of significance.

10. OnlineStatBook Project, Rice University

In an article on one-sided tests: "Some have argued that a one-tailed test is justified whenever the researcher predicts the direction of an effect. The problem with this argument is that if the effect comes out strongly in the non-predicted direction, the researcher is not justified in concluding that the effect is not zero. Since this is unrealistic, one-tailed tests are usually viewed skeptically if justified on this basis alone."

A one-sided test should never be justified on a prediction. Rather, it is justified by the question being asked.

11. Statistics Software Vendor Oracle

We see statements like "One-tailed tests should be used only when you are not worried about missing an effect in the untested direction.", that one-sided tests "Only accounts for one scenario" and that they "Can lead to inaccurate and biased results" in a blog post explaining the difference between one-tailed and two-tailed testing. All of these are demonstrably false.

12. TutorVista Educational Resource

On one-sided tests: "This type of test completely ignores any possibility of relationship in the direction which is not specified. It just focuses on the direction of interest." – mistaking outcome space with sample space. https://math.tutorvista.com/statistics/one-sided-tests.html

13. Statistics Software Vendor XSTAT

In a user documentation article on one-tailed and two-tailed tests: "A One-tailed test is associated to an alternative hypothesis for which the sign of the potential difference is known before running the experiment and the test" the wrong equation of the outcome space with the sample space is in full effect here. If you know the sign of the potential difference before running the experiment the statistical test becomes one of pure estimation. Also, such knowledge precludes running a previous experiment in which a directional outcome was stated, using a one-sided test.

14. ScienceR (article by Statistician Gjalt-Jorn Peters, PhD)

According to Gjalt-Jorn Peters one-sided tests in psychology are indefensible: "Employing NHST means you test under the assumption the null hypothesis (no difference) is true. Onesided testing changes this: you no longer assume the null hypothesis is not true; instead you assume that either the null hypothesis is true, or there is an effect of any possible size that is in the opposite direction of your hypothesizes effect." This is classic mistake of equating the null hypothesis with the nil hypothesis. The nil hypothesis is a just one of infinitely many possible null hypotheses depending on the research question at hand and its expression as a statistic. Its use as a primary null hypothesis of interest is rarely justified.

15. Discovering Statistics by Andy Field, PhD

In an article on one-tailed tests: "Many a scientist’s throat has a one-tailed effect in the opposite direction to that predicted wedged in it, turning their face red (with embarrassment). […] One-tailed tests encourage cheating."

You can read the whole argument. I think it is an amalgamation of conflating the null hypothesis with a nil hypothesis and mistaking the outcome space with the sample space.

16. PMean, Statistician Stephen D. Simon, PhD

"You're also not allowed to change you mind about which test you used once the protocol is written and approved. That's considered a protocol deviation, and a pretty serious one at that. Generally, people will use nasty words like "fraud" and "cheating" if you change from a one-sided to a two-sided hypothesis after the data is collected. Changing from a two-sided to a one-sided hypothesis is equally bad. And when you get hauled before the Ethics board, I'll testify that anyone with the good sense to compliment my webpages deserves a break." This is what Prof.Simon has to say in his article on one-tailed tests.

While it may be true that nasty words like "fraud" and "cheating" might be thrown at you, they would only convey the ignorance of the utterers, no more. When you get hauled before the equally ignorant Ethics board, get a competent statistician on your side: you might just be able to alter the policy and guidelines of whatever field of study you are practicing in, making a name for yourself in the process.

17. Study.com Online Academy

In an online lesson on the differences between one-tailed and two-tailed tests: "A one-tailed test is useful if you have a good idea, usually based on your knowledge of the subject, that there is going to be a directional difference between the variables.". A statement that truly defeats the purpose of statistical test in which we want minimal assumptions and let the data speak. Try saying to any critical research that you have inflated your p-values by assuming there is an effect in some direction… Assumptions or predictions of any sort are not a requirement or necessity in using one-sided tests. They are irrelevant.


[1] Lombardi C.M., Hurlbert S.H. (2009) "Misrepresentation and misuse of one-tailed tests", Austral Ecology 34(4):447-468; https://doi.org/10.1111/j.1442-9993.2009.01946.x

[2] Goodman S. (1988) "One-sided or two-sided p values?", Controlled Clinical Trials 9(4):387-388

[3] Hick W.E. (1952) "A note on one-tailed and two-tailed tests", Psychological Review 59(4):316-318; http://dx.doi.org/10.1037/h0056061

[4] Burke C. J. (1953) "A brief note on one-tailed tests" Psychological Bulletin, 50(5):384-387; http://dx.doi.org/10.1037/h0059627

[5] Kimmel H.D. (1957) "Three criteria for the use of one-tailed tests", Psychological Bulletin, 54(4):351-353; http://dx.doi.org/10.1037/h0046737

[6] Bland J.M., Altman D.G. (1994) "Statistics Notes: One and two sided tests of significance", British Medical Journal (Clinical Trials Edition) 309-6949:248; https://doi.org/10.1136/bmj.309.6949.248

[7] Fisher L.D. (1991) "The use of one-sided tests in drug trials: an FDA advisory committee member's perspective.", Journal of Biopharmaceutical Statistics 1(1):151-156; https://doi.org/10.1080/10543409108835012

[8] Ruxton G.D., Neuhäuser M. (2010) "When should we use one‐tailed hypothesis testing?", Methods in Ecology and Evolution 1(2):114-117; http://dx.doi.org/10.1111/j.2041-210X.2010.00014.x

[9] Moyé M.D.; Tita A.T (2002) "Defending the Rationale for the Two-Tailed Test in Clinical Research", Circulation 105(25):3062-5; http://dx.doi.org/10.1161/01.CIR.0000018283.15527.97

[10] US Food and Drug Administration (FDA): "Statistical Guidance on Reporting Results from Studies Evaluating Diagnostic Tests", drafted in 2003, issued on March 13, 2007.

[11] European Medicines Agency (EMA): "Statistical Principles for Clinical Trials", drafted 1997, issued Mar 1998.

[12] Baguley T. (2012) "Serious Stats: A guide to advanced statistics for the behavioral sciences" published by Macmillan International Higher Education

[13] Gabay M. (2015) "The clinical practice of drug information" published by Jones & Bartlett Learning.

Enjoyed this article? Please, consider sharing it where it will be appreciated!

Cite this article:

If you'd like to cite this online article you can use the following citation:
Georgiev G.Z., "Examples of Negative Portrayals of One-Sided Significance Tests", [online] Available at: https://www.onesided.org/articles/negative-portrayals-of-one-sided-significance-tests.php URL [Accessed Date: 15 Jul, 2024].