Q-value (statistics)

In statistical hypothesis testing, specifically multiple hypothesis testing, the q-value provides a means to control the positive false discovery rate. Just as the p-value gives the expected false positive rate obtained by rejecting the null hypothesis for any result with an equal or smaller p-value, the q-value gives the expected pFDR obtained by rejecting the null hypothesis for any result with an equal or smaller q-value.

History

In statistics, testing multiple hypotheses simultaneously using methods appropriate for testing single hypotheses tends to yield many false positives: the so-called multiple comparisons problem. For example, assume that one were to test 1,000 null hypotheses, all of which are true, and to reject null hypotheses with a significance level of 0.05; due to random chance, one would expect 5% of the results to appear significant, yielding 50 false positives. Since the 1950s, statisticians had been developing methods for multiple comparisons that reduced the number of false positives, such as controlling the family-wise error rate using the Bonferroni correction, but these methods also increased the number of false negatives. In 1995, Yoav Benjamini and Yosef Hochberg proposed controlling the false discovery rate as a more statistically powerful alternative to controlling the FWER in multiple hypothesis testing. The pFDR and the q-value were introduced by John D. Storey in 2002 in order to improve upon a limitation of the FDR, namely that the FDR is not defined when there are no positive results.

Definition

Let there be a null hypothesis and an alternative hypothesis. Perform hypothesis tests; let the test statistics be i.i.d. random variables such that. That is, if is true for test , then follows the null distribution ; while if is true, then follows the alternative distribution. Let, that is, for each test, is true with probability and is true with probability. Denote the critical region at significance level by. Let an experiment yield a value for the test statistic. The q-value of is formally defined as
That is, the q-value is the infimum of the pFDR if is rejected for test statistics with values. Equivalently, the q-value equals
which is the infimum of the probability that is true given that is rejected.

Relationship to the ''p''-value

The p-value is defined as
the infimum of the probability that is rejected given that is true. Comparing the definitions of the p- and q-values, it can be seen that the q-value is the minimum posterior probability that is true.

Interpretation

The q-value can be interpreted as the false discovery rate : the proportion of false positives among all positive results. Given a set of test statistics and their associated q-values, rejecting the null hypothesis for all tests whose q-value is less than or equal to some threshold ensures that the expected value of the false discovery rate is.

Applications

Biology

Gene expression

involve simultaneously testing the expression of thousands of genes. Controlling the FWER avoids excessive false positives but imposes a strict threshold for the p-value that results in many false negatives. However, controlling the pFDR by selecting genes with significant q-values lowers the number of false negatives while ensuring that the expected value of the proportion of false positives among all positive results is low.
For example, suppose that among 10,000 genes tested, 1,000 are actually differentially expressed and 9,000 are not:

If we consider every gene with a p-value of less than 0.05 to be differentially expressed, we expect that 450 of the 9,000 genes that are not differentially expressed will appear to be differentially expressed.
If we control the FWER to 0.05, there is only a 5% probability of obtaining at least one false positive. However, this very strict criterion will reduce the power such that few of the 1,000 genes that are actually differentially expressed will appear to be differentially expressed.
If we control the pFDR to 0.05 by considering all genes with a q-value of less than 0.05 to be differentially expressed, then we expect 5% of the positive results to be false positives. This strategy enables one to obtain relatively low numbers of both false positives and false negatives.
Implementations

Note: the following is an incomplete list.

R

The package in R estimates q-values from a list of p-values.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...