Posts Tagged ‘code’

Write unit tests!

7 Dec 2015

Since 2000, I’ve been working on R/qtl, an R package for mapping the genetic loci (called quantitative trait loci, QTL) that contribute to variation in quantitative traits in experimental crosses. The Bioinformatics paper about it is my most cited; also see my 2014 JORS paper, “Fourteen years of R/qtl: Just barely sustainable.”

It’s a bit of a miracle that R/qtl works and gives the right answers, as it includes essentially no formal tests. The only regular tests are that the examples in the help files don’t produce any errors that halt the code.

I’ve recently been working on R/qtl2, a reimplementation of R/qtl to better handle high-dimensional data and more complex crosses, such as Diversity Outbred mice. In doing so, I’m trying to make use of the software engineering principles that I’ve learned over the last 15 years, which pretty much correspond to the ideas in “Best Practices for Scientific Computing” (Greg Wilson et al., PLOS Biology 12(1): e1001745, doi:10.1371/journal.pbio.1001745).

I’m still working on “Make names consistent, distinctive, and meaningful”, but I’m doing pretty well on writing shorter functions with less repeated code, and particularly importantly I’m writing extensive unit tests.
(more…)

It’s not you, it’s me

24 Sep 2015

Somehow when my code stops working, my first (and second, and third) reaction is to blame everything except my own code. (“It’s not me, it’s you.”)

And almost always, it’s my own code that’s the problem (hence the title of this post).

I spent the day trying to resolve a bug in my early-in-development R package, qtl2geno. In the process, I blamed

  • TravisCI for not handling system.file() properly.
  • R-devel for having broken system.file().
  • data.table::fread() for treating sep=NULL differently on different operating systems.

Of course, none of these were true. I was just passing sep=NULL to data.table::fread(), and that worked in the previous version, but doesn’t work in the latest release on CRAN, and I hadn’t yet installed the latest version of data.table on my Mac, but Travis and my junky Windows laptop had the latest version.

The debugging process seems a potentially interesting case study, so I thought I’d write down some of the details.

(more…)

Randomized Hobbit

22 Jun 2015

@wrathematics pointed me to his ngram R package for constructing and simulating from n-grams from text.

I’d recently grabbed the text of the hobbit, and so I applied it to that text, with amusing results.

Here’s the code I used to grab the text.

library(XML)
stem <- "http://www.5novels.com/classics/u5688"
hobbit <- NULL
for(i in 1:74) {
    cat(i,"\n")
    if(i==1) {
        url <- paste0(stem, ".html")
    } else {
        url <- paste0(stem, "_", i, ".html")
    }

    x <- htmlTreeParse(url, useInternalNodes=TRUE)
    xx <- xpathApply(x, "//p", xmlValue)
    hobbit <- c(hobbit, gsub("\r", "", xx[-length(xx)]))
    Sys.sleep(0.5)
}

Then calculate the ngrams with n=2.

library(ngram)
ng2 <- ngram(hobbit, n=2)

Simulate some number of words with babble(). If you use the seed argument, you can get the result reproducibly.

babble(ng2, 48, seed=53482175)

into trees, and then bore to the Mountain to go through?” groaned the hobbit. “Well, are you doing, And where are you doing, And where are you?” it squeaked, as it was no answer. They were surly and angry and puzzled at finding them here in their holes

Update: @wrathematics suggested that I mix two texts, so here’s a bit from the Hobbit in the Hat (The Hobbit with 59× Cat in the Hat — up-sampled to match lengths.) But there’s maybe not enough overlap between the two texts to get much of a mixture.

“I am Gandalf,” said the fish. This is no way at all!

already off his horse and among the goblin and the dragon, who had remained behind to guard the door. “Something is outside!” Bilbo’s heart jumped into his boat on to sharp rocks below; but there was a good game, Said our fish No! No! Those Things should not fly.

Error notifications from R

4 Sep 2014

I’m enthusiastic about having R notify me when my script is done.

But among my early uses of this, my script threw an error, and I never got a text or pushbullet about that. And really, I’m even more interested in being notified about such errors than anything else.

It’s relatively easy to get notified of errors. At the top of your script, include code like options(error = function() { } )

Fill in the function with your notification code. If there’s an error, the error message will be printed and then that function will be called. (And then the script will halt.)

You can use geterrmessage() to grab the error message to include in your notification.

For example, if you want to use RPushbullet for the notification, you could put, at the top of your script, something like this:

options(error = function() { 
                    library(RPushbullet)
                    pbPost("note", "Error", geterrmessage())
                })

Then if the script gives an error, you’ll get a note with title “Error” and with the error message as the body of the note.

Update: I knew I’d heard about this sort of thing somewhere, but I couldn’t remember where. Duh; Rasmus mentioned it on twitter just a couple of days ago! Fortunately, he reminded me of that in the comments below.

Another update: Ian Kyle pointed out in the comments that the above function, if used in a script run with R CMD BATCH, won’t actually halt the script. The simplest solution is to add stop(geterrmessage()), like this:

options(error = function() { 
                    library(RPushbullet)
                    pbPost("note", "Error", geterrmessage())
                    if(!interactive()) stop(geterrmessage())
                })

Notifications from R

3 Sep 2014

You just sent a long R job running. How to know when it’s done? Have it notify you by beeping, sending you a text, or sending you a notification via pushbullet.

(more…)

Why hadn’t I written a function for that?

16 Jul 2014

I’m often typing the same bits of code over and over. Those bits of code really should be made into functions.

For example, I’m still using base graphics. (ggplot2 is on my “to do” list, really!) Often some things will be drawn with a slight overlap of the border of the plotting region. And in heatmaps with image, the border is often obscured. I want a nice black rectangle around the outside.

So I’ll write the following:

u <- par("usr")
rect(u[1], u[3], u[2], u[4])

I don’t know how many times I’ve typed that! Today I realized that I should put those two lines in a function add_border(). And then I added add_border() to my R/broman package.

It was a bit more work adding the Roxygen2 comments for the documentation, but now I’ve got a proper function that is easier to use and much more clear.

Update: @tpoi pointed out that box() does the same thing as my add_border(). My general point still stands, and this raises the additional point: twitter + blog → education.

I want to add, “I’m an idiot” but I think I’ll just say that there’s always more that I can learn about R. And I’ll remove add_border from R/broman and just use box().

hipsteR: re-educating people who learned R before it was cool

15 May 2014

This morning, I started a tutorial for folks whose knowledge of R is (like mine) stuck in 2001.

Yesterday I started reading the Rcpp book, and on page 4 there’s an example using the R function replicate, which (a) I’d never heard before, and (b) is super useful.

I mean, I often write code like this, for a bootstrap:

x <- rnorm(2500)
sapply(1:1000, function(a) quantile(sample(x, replace=TRUE), c(0.025, 0.975)))

But I could just be writing

x <- rnorm(100)
replicate(1000, quantile(sample(x, replace=TRUE), c(0.025, 0.975)))

“Oh, replicate must be some new function.” Yeah, new in R version 1.8, in 2003!

I’m in serious need of some re-education (e.g., I should be using more of Hadley Wickham’s packages). Hence the beginnings of a tutorial.


Note: the title was suggested by Thomas Lumley. No connection to @hspter; it’s not really so hip. I probably should have written “geeksteR.”

Further points on crayon colors

9 May 2014

I saw this great post on crayola crayon colors at the Learning R blog, reproducing a nice graph of the Crayola crayon colors over time. (Also see this even nicer version.)

The Learning R post shows how to grab the crayon colors from the wikipedia page, “List of Crayola crayon colors,” directly in R. Here’s the code (after some slight modifications due to changes in the page since 2010):

library(XML)
theurl <- "http://en.wikipedia.org/wiki/List_of_Crayola_crayon_colors"
crayontable <- readHTMLTable(theurl, stringsAsFactors = FALSE)[[1]]
crayons <- crayontable[,grep("Hex", colnames(crayontable))]
names(crayons) <- crayontable[,"Color"]

Comparing these to what I’d grabbed, I noted one small discrepancy on the Wikipedia page: Yellow Orange was listed as "#FFAE42" but the background color for the Yellow Orange cell in the table was "#FFB653".

So I created a Wikipedia account and edited the Wikipedia page.

(Then I realized that I’d made a mistake in my edit, undid my change, thought the whole thing through again, and edited the page again.)

The Learning R post also showed a different way to sort the colors: convert to HSV, and then sort by the H then S then V. So I edited my plot_crayons() function again, to create the following picture:

Crayon colors, again

Two more points about crayon colors

8 May 2014

If you want to use crayon colors in R but you don’t want to rely on my R/broman package, you can just grab the code. Copy the relevant lines from the R/brocolors.R file:

crayons = c("Almond"="#efdecd",
            "Antique Brass"="#cd9575",
            "Apricot"="#fdd9b5",
            ...
            "Yellow Green"="#c5e384",
            "Yellow Orange"="#ffb653")

I spent a bit of time thinking about how best to sort the colors in a meaningful way, for the plot_crayons() function. But then decided to stop thinking and just do something brainless: measure distance between colors by RMS difference of the RGB values, and then use hierarchical clustering. Here’s the code from plot_crayons():

# get rgb 
colval <- t(col2rgb(crayons))

# hclust to order the colors
ord <- hclust(dist(colval))$order

It’s not perfect, but I think it worked remarkably well:

Crayon colors

“[” with the apply() functions, revisited

29 Apr 2014

I’d mentioned in the fall that one could use "[" in the apply-type functions, like this:

id <- c("ZYY-43S-CWA3", "6YU-F4B-VD2I")
sapply(strsplit(id, "-"), "[", 2)

I just realized that you can use this with matrices, too. If you have a list of matrices, you can pull out rows and columns with this technique.

z <- list(matrix(1:10, nrow=2), matrix(11:20, nrow=2))
lapply(z, "[", 1, )
lapply(z, "[", , 3)

As you can see, my data isn’t “tidy.”