if ... else and ifelse

Let’s make this a quick and quite basic one. There is this incredibly useful function in R called ifelse(). It’s basically a vectorized version of an if … else control structure every programming language has in one way or the other. ifelse() has, in my view, two major advantages over if … else:

  1. It’s super fast.
  2. It’s more convenient to use.

The basic idea is that you have a vector of values and whenever you want to test these values against some kind of condition, you want to have a specific value in another vector. An example follows below. First, let’s load the {rbenchmark} package to see the speed benefits.

library(rbenchmark)

Now, the toy example: I am creating a vector of half a million random normally distributed values. For each of these values, I want to know whether the value is below or above zero.

x <- rnorm(500000)

ifelse() is used as ifelse(<TEST>, <OUTCOME IF TRUE>, <OUTCOME IF FALSE>), so we need three arguments. My test is x < 0 and I want to have the string "negative" in y whenever the corresponding value in x is smaller than zero. If this is not the case, then y should have a "positive" in this position. ifelse() only needs one line of code for this.

benchmark(replications = 50, {
  y <- ifelse(x < 0, "negative", "positive")
})$user.self
## [1] 4.215

We could also solve this with a for loop. But, as you can see, this takes approx. 3 times as long.

benchmark(replications = 50, {
  y <- c()
  for (i in x) {
    if (i < 0) { 
      y[length(y)+1] <- "negative"
    } else { 
      y[length(y)+1] <- "negative"
    }
  }
  })$user.self
## [1] 13.021

The same is true for an sapply() version. sapply() even consistently takes a little longer than a for loop in this case - to my surprise.

benchmark(replications = 50, {
  y <- sapply(x, USE.NAMES = F, FUN = function (i) {
    if (i < 0) {
      "negative"
    } else {
      "positive"
    }
  }
  )
})$user.self
## [1] 15.023

It’s highly unlikely that rnorm() produces a value of exactly zero. But we could also check for this by simply nesting calls to ifelse(). If you want to do this, you simply add another ifelse() in the “FALSE” part of the previous ifelse() as I did below. In this little toy example, this nested test is still considerably faster than the for or sapply() versions of the single test.

benchmark(replications = 50, {
  y <- ifelse(x < 0, "negative",
              ifelse(x > 0, "positive", "exactly zero"))
})$user.self
## [1] 8.381