Since 2000, I’ve been working on R/qtl, an R package for mapping the genetic loci (called quantitative trait loci, QTL) that contribute to variation in quantitative traits in experimental crosses. The Bioinformatics paper about it is my most cited; also see my 2014 JORS paper, “Fourteen years of R/qtl: Just barely sustainable.”
It’s a bit of a miracle that R/qtl works and gives the right answers, as it includes essentially no formal tests. The only regular tests are that the examples in the help files don’t produce any errors that halt the code.
I’ve recently been working on R/qtl2, a reimplementation of R/qtl to better handle high-dimensional data and more complex crosses, such as Diversity Outbred mice. In doing so, I’m trying to make use of the software engineering principles that I’ve learned over the last 15 years, which pretty much correspond to the ideas in “Best Practices for Scientific Computing” (Greg Wilson et al., PLOS Biology 12(1): e1001745, doi:10.1371/journal.pbio.1001745).
I’m still working on “Make names consistent, distinctive, and meaningful”, but I’m doing pretty well on writing shorter functions with less repeated code, and particularly importantly I’m writing extensive unit tests.
(more…)