apply vs for

It’s widely understood that, in R programming, one should avoid for loops and always try to use apply-type functions.

But this isn’t entirely true. It may have been true for Splus, back in the day: As I recall, that had to do with the entire environment from each iteration being retained in memory.

Here’s a simple example:

> x <- matrix(rnorm(4000*40000), ncol=4000)

> system.time({
+     mx <- rep(NA, nrow(x))
+     for(i in 1:nrow(x)) mx[i] <- max(x[i,])
+  })
   user  system elapsed 
  3.719   0.446   4.164

> system.time(mx2 <- apply(x, 1, max))
   user  system elapsed 
  5.548   1.783   7.333

There’s a great commentary on this point by Uwe Ligges and John Fox in the May, 2008, issue of R News (see the “R help desk”, starting on page 46, and note that R News is now the R Journal).

They say that apply can be more readable. It can certainly be more compact, but I usually find a for loop to be more readable, perhaps because I’m a C programmer first and an R programmer second.

A key point, from Ligges and Fox: “Initialize new objects to full length before the loop, rather than increasing their size within the loop.”

Tags: code, R

This entry was posted on 2 Apr 2013 at 11:44 pm and is filed under R. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.

7 Responses to “apply vs for”

Robert Young Says:
3 Apr 2013 at 8:45 am
— I usually find a for loop to be more readable, perhaps because I’m a C programmer first and an R programmer second.

In a nutshell. If one were a relational database zealot, or prolog coder, by default, then the “set” based approach is far more natural. And given that user typed R code is nearly (always?) slower than internal R C code, use of set implemented syntax is to be preferred. Better yet, if your data is in a sql database (and you’ve wisely not eaten of Eve’s NoSql apple, have you?), do all that munging over there first. Will save a ton effort and headache.
Alberto Says:
3 Apr 2013 at 2:37 pm
I guess you forgot the [i] in the mx assignment.
I mean:
> system.time({
+ mx <- rep(NA, nrow(x))
+ for(i in 1:nrow(x)) mx[i] <- max(x[i,])
+ })
- Karl Broman Says:
  3 Apr 2013 at 3:12 pm
  Oops; thanks! I updated the post….
TszKin Julian Says:
3 Apr 2013 at 11:02 pm
I also found that loop is faster than *apply in some cases. Especially if the object is very big. (or the returning object is big)

For looping, we can compile the code to boost up the speed. For *apply, we can make it parallel to boost up the speed.

To me, if the work is computation intensive then i would prefer apply. If the data object is large or require nested looping, then i would prefer looping.

Here is an example based on your code :

> library(compiler)
> library(parallel)
>
> n x
> f <- cmpfun(function(x){
+ mx <- rep(NA, nrow(x))
+ for(i in 1:nrow(x)) mx[i]
> system.time({
+ mx <- rep(NA, nrow(x))
+ for(i in 1:nrow(x)) mx[i]
> system.time(mx2
> system.time({ mx3
> cl=makeCluster(detectCores())
> clusterExport(cl,”x”)
>
> system.time(mx2
>
TszKin Julian Says:
3 Apr 2013 at 11:10 pm
One more point here,
lapply would be faster than apply
system.time(mx2 <- apply(x, 1, max))
system.time(mx2 <- lapply(1:nrow(x), function(z) max(x[z,]) ))

sorry, the code seems to be truncated. Let me try it again:

n <- 40000
x <- matrix(rnorm(400*n), ncol=400)

f <- cmpfun(function(x){
mx <- rep(NA, nrow(x))
for(i in 1:nrow(x)) mx[i] <- max(x[i,])
})

system.time({
mx <- rep(NA, nrow(x))
for(i in 1:nrow(x)) mx[i] <- max(x[i,])
})

system.time(mx2 <- apply(x, 1, max))
system.time(mx2 <- lapply(1:nrow(x), function(z) max(x[z,]) ))

system.time({ mx3<-f(x)})

cl=makeCluster(detectCores())
clusterExport(cl,"x")

system.time(mx2 <- parSapply(cl,1:nrow(x),function(z) max(x[z,]) ))
fer_rabanal Says:
5 Apr 2013 at 5:42 am
Reblogged this on Easy ML World.
isomorphismes Says:
16 Feb 2014 at 2:56 am
Maybe this comes from the R Inferno — “failing to vectorise” is chapter 3, he makes the point about initialising at full size in ch 2 (gluttons), and he pokes fun at “Speaking R with a strong C accent” (or was it the other way around?) Anyway. He’s a popular source. Could be where the conventional wisdom comes from.