fdrcontrol

The fdrcontrol package tends replace p.adjust(method=‘BH’) or p.adjust(method=‘fdr’) for a faster filtering step.

The FDRcontrol function takes in a vector of p-values and an alpha value (threshold) for controlling false discovery rate. The algorithm used was described in here.

The function return 1 when the datapoint is below the threshold and 0 if it is above the threshold.

set.seed(0)
library(fdrcontrol)
library(ggplot2)
library(dplyr)
library(tidyr)

alpha <- 0.01
no_of_test <- 100000

#simulate data
x <- rnorm(no_of_test, mean = c(rep(0, 25), rep(3, 25)))
p <- 2*pnorm(sort(-abs(x)))
df <- data_frame(p)
hist(df$p)

df %>% 
    mutate(fdr = FDRcontrol(p,alpha))%>%  #usage of the function FDRcontrol
    mutate(padj = p.adjust(p,method='BH')) %>%
    arrange(p) %>%
    mutate(idx = 1:nrow(.)) %>%
    tbl_df -> df

df %>% 
    filter(fdr==1,padj > alpha) #none of the data point disagree

## Source: local data frame [0 x 4]
## 
## Variables not shown: p (dbl), fdr (dbl), padj (dbl), idx (int)

A plot of p-values vs the ranking of the p-values is plotted below. The green line plotted the threshold at y-axis along the ranking of p-values, which follows: \[ y = \frac{\hat{i}}{m}\alpha \] where \(\hat{i}\) is the rank of the p-values, \(m\) is the no. of test that were included in the experiment and alpha is false discovery rate that we can tolerate.

df %>% 
    mutate(FDRcontrol = ifelse(fdr==1,'Passed','Failed')) %>%
    mutate(p.adjust = ifelse(padj< alpha, 'Passed','Failed'))  %>%
    select(-fdr,-padj) %>%
    gather(Method,tag,-p,-idx) %>%
    ggplot() +
        geom_point(aes(y=p,x=idx,color=factor(tag))) +
        geom_line(aes(y = idx/no_of_test * alpha, x=idx),color='green',size = 1.5,alpha=0.5) +
        facet_grid(.~Method)+
        theme(text=element_text(size=20)) +
        theme(axis.text.x = element_blank()) +
        ylim (0,0.1) +
        labs(color=' ', x = 'Index', y = 'p-value')

To compare the speed of computation:

library(microbenchmark)
microbenchmark(df %>% mutate(FDR = FDRcontrol(p,0.01)),
            df %>% mutate(padj = p.adjust(p,method='fdr')))

## Unit: milliseconds
##                                               expr       min        lq
##           df %>% mutate(FDR = FDRcontrol(p, 0.01))  2.186808  3.310369
##  df %>% mutate(padj = p.adjust(p, method = "fdr")) 16.605776 20.416440
##       mean    median        uq      max neval cld
##   3.981678  3.636127  4.042673 13.79101   100  a 
##  24.637293 23.822474 27.083515 91.59464   100   b

fdrcontrol

Dougals Wu

November 2, 2015