“[” and “[[” with the apply() functions

Did you know you can use "[" and "[[" as function names for subsetting with calls to the apply-type functions?

For example, suppose you have a bunch of identifier strings like "ZYY-43S-CWA3" and you want to pull off the bit before the first hyphen ("ZYY" in this case). (For code to create random IDs like that, see the end of this post.)

Suppose the IDs are in a vector of character strings, id.

If I wanted to grab the bit before the first hyphen, I would typically use strsplit and then sapply with function(a) a[1], as so:

sapply(strsplit(id, "-"), function(a) a[1])

But in place of function(a) a[1], you can use "[", 1, as follows:

sapply(strsplit(id, "-"), "[", 1)

I think that’s kind of cute. You can use "[[" the same way, if you’re working with lists.

Here’s some code to create random IDs of this form, to test out the above:

nind <- 8
lengths <- c(3, 3, 4)
id <- NULL
for(i in seq(along=lengths)) {
  randchar <- sample(c(LETTERS, 0:9), nind*lengths[i], replace=TRUE)
  randstring <- apply(matrix(randchar, ncol=lengths[i]),
                   1, paste, collapse="")
  if(is.null(id)) id <- randstring
  else id <- paste(id, randstring, sep="-")

Tags: ,

5 Responses to ““[” and “[[” with the apply() functions”

  1. John Ramey (@ramhiser) Says:

    Another useful trick is:

    sapply(strsplit(id, "-"), head, n = 1)

    Of course, that is to extract only the first element. Similarly, the last element can be extracted via tail.

    • Karl Broman Says:

      Good one. I guess I should have gone after the middle bit in my example, so that head and tail wouldn’t apply.

      sapply(strsplit(id, "-"), "[", 2)

  2. firstmark Says:

    Reblogged this on m's R Blog and commented:
    [ and [[ are a little bit faster (~15%) in the case below:

    prefix <- sample(LETTERS, size=100, replace=TRUE)
    id <- paste(prefix, abs(100 * rnorm(100)), sep="-")
    benchmark(sapply(strsplit(id, "-"), function(a) a[1]), sapply(strsplit(id, "-"), "[", 1),       
              order="elapsed", replications=100)
  3. Hugh Says:

    unlist(strsplit(id, "-"))[1]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s