Code performance in R: Parallelization

2021-06-30by Mira Céline Klein

This is the third part of our series about code performance in R. In the first part, I introduced methods to measure which part of a given code is slow. The second part lists general techniques to make R code faster. In this part you are going to see how to take advantage from parallelization in R.

What is parallelization?

In many cases, your code fulfills multiple independent tasks, for example, if you do a simulation with five different parameter sets. The five processes don't need to communicate with each other, and they don't need any result from any other process. They could even be run simultaneously on five different computers... or processor cores. This is called parallelization. Modern desktop computers usually have 16 or more processor cores. To find out how many cores you have on your PC, use the function detectCores(). By default, R uses only one core, but this article tells you how to use multiple cores. If your simulation needs 20 hours to complete with one core, you may get your results within four hours thanks to parallelization!

When should you parallelize?

The more you parallelize, the faster the code? Unfortunately it's not that simple. The process of initiating the parallelization itself takes some time. For example, the computer needs to decide which core does which task. Thus, if the task itself is very fast anyway, parallelization can actually make the code slower. But if it runs for minutes or longer, parallelization is often useful. If you are not sure, just find it out by using system.time. It's usually more useful to parallelize a larger part of your code at once instead of many small functions separately. You also have to keep working memory consumption in mind - more about that in the following sections.

Implementation

Now that we talked about the potential advantages of parallelization, let's come to implementation. The exact implementation depends on your operating system: Parallelization in Windows works a bit different from Linux or Mac.

Linux and Mac

Parallelization with Linux, Mac or other Unix-based systems is *very* easy. You only need to write your code in a way that it contains an lapply structure. Then replace the lapply with the multicore version mclapply from the parallel package. By default, mclapply will use all cores on the computer. If you want to specify the number yourself, use the argument mc.cores. An example where we do the same thing with five data frames dat1 to dat5 looks like this:

library(parallel)

dataList <- list(dat1, dat2, dat3, dat4, dat5)

mclapply(dataList, mc.cores = 5, function(dat) {
  # some code doing something with dat
})

The code returns a list of length five containing the results for the five data frames, just like lapply – only faster.

Windows

Parallelization on Windows is a little more complicated. Again, you can use the parallel package. You first need to create a so-called "cluster" with a specific number of "nodes". This is what the makeCluster function does. Each node uses one core and can do one task at the time. Then you do your computations with parLapply, and afterwards you stop the cluster with stopCluster.

The following code computes some square numbers in parallel (although this is definitely not a task which you would need to parallelize, because it doesn't take long - just an example).

library(parallel)

myCluster <- makeCluster(4) # Cluster with 4 nodes

result <- parLapply(myCluster,
                    seq(1000, 100000, by = 1000),
                    function(x) {
                      # Some function doing some simulation
                      y <- rnorm(n = x)
                      quantile(y)
                    })

stopCluster(myCluster)

Apart from the slightly more complicated code, there is another reason why you should think twice about parallelization with Windows: working memory. With Linux, each process can access the same working memory. This means that if you start your computations with one large dataset, each process can use the same single copy of that dataset. Additional memory is needed for (intermediate) results. On the contrary, with Windows, each process needs its own copy of the dataset. This can consume a large proportion of your working memory right from the start.

Another difference is the environment where the computations take place. With Linux, your parallelized code can use any R object in your environment as well as functions from loaded packages. With Windows, each parallel process starts in a new, empty environment. This means that you have to put every necessary object into this new environment "by hand" with the function clusterExport before you call parLapply. For details, see ?clusterExport.

How many cores should you use?

The optimal number of cores depends on several aspects. If you still want to use your computer for anything else during the computations (e.g., checking your E-mails or writing a Word document), you should, of course, reserve at least one core for those remaining tasks. Another important point is working memory: If you start multiple tasks, each of them will need its own working memory. Each process needs enough memory to store the (intermediate) results. It often happens that the limiting factor is not the number of cores, but the working memory.

Further parts of the article series:

Blog

2021-06-30by Mira Céline Klein