Please install the following packages:
install.packages(c("foreach", "doParallel", "doRNG",
"snowFT", "extraDistr", "ggplot2",
"reshape2", "wpp2017"),
dependencies = TRUE)
Many statistical simulations have the following structure:
initialize.rng(...)
for (iteration in 1:N) {
result[iteration] <- myfunc(...)
}
process(result,...)
If calls of myfunc
are independent of one another, we can transform the simulation as follows:
myfunc
.There are many packages in R that work in this fashion. One of the first packages, snow (Simple Network of Workstations) has been recently re-implemented as an R core package called parallel.
mclapply
, mcmapply
and mcMap
.Load the package and check how many cores you have:
library(parallel)
detectCores() # counts hyperthreaded cores
P <- detectCores(logical = FALSE) # physical cores
P
Start and stop a pool of workers with one worker per core:
cl <- makeCluster(P)
cl
typeof(cl)
length(cl)
cl[[1]]
cl[[P]]
typeof(cl[[P]])
names(cl[[P]])
stopCluster(cl)
# cl[[1]] # gives an error
Socket communication (default):
cl <- makeCluster(P, type = "PSOCK")
stopCluster(cl)
type = "FORK"
type = "MPI"
type = "NWS"
Start a cluster that will be used to solve multiple tasks:
cl <- makeCluster(P)
Let’s get each worker to generate as many normally distributed random numbers as its position in the list:
clusterApply(cl, 1:P, fun = rnorm)
The second argument is a sequence where each element gets passed to the corresponding worker, namely as the first argument to the function fun
. In this example, the first worker got the number 1, second 2 etc. which is passed as the first argument to the rnorm
function. Thus, the node cl[[4]]
for example evaluates rnorm(4, mean = 0, sd = 1)
.
Pass additional arguments to fun
:
clusterApply(cl, 1:P, fun = rnorm,
mean = 10, sd = 2)
Evaluate a function more times than the number of workers: Generate 20 sets of 100,000 random numbers from N(mean=5, sd=1) and return the average of each set:
res <- clusterApply(cl, rep(100000, 20),
fun = function(x) mean(rnorm(x, mean = 5)))
length(res)
head(res)
mean(unlist(res))