Posts Tagged ‘stupid’

Write unit tests!

7 Dec 2015

Since 2000, I’ve been working on R/qtl, an R package for mapping the genetic loci (called quantitative trait loci, QTL) that contribute to variation in quantitative traits in experimental crosses. The Bioinformatics paper about it is my most cited; also see my 2014 JORS paper, “Fourteen years of R/qtl: Just barely sustainable.”

It’s a bit of a miracle that R/qtl works and gives the right answers, as it includes essentially no formal tests. The only regular tests are that the examples in the help files don’t produce any errors that halt the code.

I’ve recently been working on R/qtl2, a reimplementation of R/qtl to better handle high-dimensional data and more complex crosses, such as Diversity Outbred mice. In doing so, I’m trying to make use of the software engineering principles that I’ve learned over the last 15 years, which pretty much correspond to the ideas in “Best Practices for Scientific Computing” (Greg Wilson et al., PLOS Biology 12(1): e1001745, doi:10.1371/journal.pbio.1001745).

I’m still working on “Make names consistent, distinctive, and meaningful”, but I’m doing pretty well on writing shorter functions with less repeated code, and particularly importantly I’m writing extensive unit tests.

God-awful conference websites

5 Aug 2015

What do I want in a conference website? Not this.

  • I want to be able to browse sessions to find the ones I’m interested in. That means being able to see the session title and time as well as the speakers and talk titles. A super-long web page is perfectly fine.
  • If you can’t show me everything at once, at least let me click-to-expand: for the talk titles, and then for the abstracts. Otherwise I have to keep clicking and going back.
  • I want to be able to search for people. And if I’m searching for Hao Wu, I don’t want to look at all of the Wus. Or all of the Haos. I just want the Hao Wus. If I can’t search on "Hao Wu", at least let me search on "Wu, Hao".
  • If my search returns nothing and I go back, bring me back to the same search form. Don’t make me have to click “Search for people” again.
  • I’d like to be able to form a schedule of the sessions to attend. (JSM2015 does that okay, but it’s not what I’d call “secure” and you have to find the damned things, first.) Really, I want to pick particular talks: this one in that session and that one in the other. But yeah, that seems a bit much to ask.

The JSM 2015 site is so terrible for browsing, I was happy to get the pdf of the program. (Good luck finding it on the website on your own; ASA tweeted the link to me, due to my bitching and moaning.) You can browse the pdf. That’s the way I ended up finding the sessions I wanted to attend. It also had an ad for the JSM 2015 mobile app. Did you know there was one? Good luck finding a link to that on their website, either.

The pdf is useable, but much like the website, it fails to make use of the medium. I want:

  • Bookmarks. I want to jump to where Monday’s sessions start without have to flip through the whole thing.
  • Hyperlinks. If you don’t include the abstracts, with links from the talk titles to the abstracts, at least include links to the web page that has the abstract so I don’t have to search on the web.
  • More hyperlinks. The pdf has an index, with people and page numbers. Why not link those page numbers to the corresponding page?

I helped organize a small meeting in 2013. The program on the web and the corresponding pdf illustrate much of what I want. (No scheduling feature, but that meeting had no simultaneous sessions.) I included gratuitous network graphs of the authors and abstracts. It’s 2015. No conference site is truly complete without interactive network graphs.


As Thomas Lumley commented below, if you search on “Wu” you get all of the “Wu”s but also there’s one “Wulfhorst”. And if you search on “Hao” you get only people whose last name is “Hao”.

He further pointed out that if you search for the affiliation “Auckland” the results don’t include “University of Auckland” but only “Auckland University of Technology”. And actually, if you search for “University of Auckland” you get nothing. You need to search for “The University of Auckland”.

If I could do it over again, I’d self-publish

12 Aug 2014

In 2009, Śaunak Sen and I wrote a book about QTL mapping and the R/qtl software. We started working on it in the fall of 2006, and it was a heck of a lot of work.

We’d talked to several publishers, and ended up publishing with Springer. John Kimmel was the editor we worked with; I like John, and I felt that Springer (or John) did a good job of keeping prices reasonable. We were able to publish in full color with a list price of $99, so that on Amazon it was about $65. (In April, 2013, there was a brief period where it was just $42 at Amazon!)

Springer did arrange several rounds of reviews; they typically pay reviewers $100 or a few books. But the copy editing was terrible (at the very least, you want a copy editor to read the book, and it was pretty clear that our copy editor hadn’t), and the actual type-setting and construction of the index was left to us, the authors.

It feels nice to have written a proper book, but I don’t think it makes that big of a difference, for me or for readers.

And John Kimmel has since left Springer to go to Chapman & Hall/CRC, and Springer has raised the price of our book to $169, so it’s now selling for $130 at Amazon. I think that’s obnoxious. It’s not like they’ve gone back and printed extra copies, so it’s hard to see how their costs could have gone up. But in the publishing agreement we signed, we gave Springer full rights to set the price of the book.

(Update: it’s now listed at $199, though it’s still about $130 at Amazon.)

I have a hard time recommending the book at that price; I’m tempted to help people find pirated PDFs online. (And seriously, if you can’t find a pirated copy, you should work on your internet skills.)

I corresponded with an editor at Springer, on why our book has become so expensive and whether there’s anything we can do about it. They responded

  • If we do a new edition, it could be listed as $129.
  • If the book is adopted by university classes, “the pricing grid it is based on would have lower prices.”
  • Our book is available electronically, for purchase by chapter as well.

Purchase by chapter? Yeah, for $30 per chapter!

Springer has published books and allowed the authors to post a PDF, but only for really big sellers, and ours is definitely not in that category.

I’m both disgusted and embarrassed by this situation. If I could do it all over again, I’d self-publish: post everything on the web, and arrange some way for folks to have it printed cheaply.

I still don’t like it

9 Feb 2014

I got a book in the mail this week, a book I hadn’t ordered and would never have ordered. The publisher sent me a complimentary copy, as I’d reviewed the book proposal last year. (It’s the one where the author refused to allow me to have an electronic copy.)

Actually, I soundly trashed the proposal in my review. In the nicest possible way, of course. For example, I said:

And then there are things that are just plain wrong. For example, “We then express our confidence in the H0 with a p-value, which might crudely be considered the probability that the H0 is true.” That is not a crude interpretation of the p-value; that is just wrong.

It seems like if a reviewer says, “This particular book should not be adopted,” the publisher can interpret that to also mean, “and whatever you do, don’t send me a copy.”

Things to avoid as a new faculty member

5 Dec 2013

The transition from graduate student or postdoc to tenure-track faculty member is hard. You discover that there are a ton of new things to learn.

Here are some thoughts on things to avoid.

  • You may see lots of ways in which your department could be improved; don’t try to fix all of them at once.
  • You may see needs for many new courses. Don’t try to develop all of them. Try to teach the same courses at least three years in a row.
  • Don’t agree to work with just any student who asks. While a good student can really help, a bad student can suck up all your time and energy. Students may be worried about money and feigning interest in order to get a research position, but if they’re not really interested, they won’t make much progress and it will be bad for both of you.
  • Don’t agree to collaborate with someone in the two weeks before a grant deadline. You might get stuck in a commitment with some total jerk. Find out if you’d really enjoy working with them, first.
  • Don’t agree to write a book chapter. It’s almost as much work as a formal paper, but not as many people will read it and it won’t count for much on your CV.

It was a really bad idea to use slides in that class

8 Oct 2013

I gave a presentation in the Statistical Consulting course at UW–Madison today. I’ve done so a number of times in the past 6 years. Until today, I’d just spoken informally from a few pages of notes. (Earlier this year, I wrote up those notes as a blog post.)

This year, just 45 min before the class, I thought I’d quickly create some slides to present. I thought it’d be an interesting “experiment” (not in the formal sense):

The outcome was pretty clear: It was easy to create a bunch of bullet-point-based slides. They look nice. (See the pdf here; source here.)

But, the slides themselves worse than useless: Unnecessary, and they interfered with the desired informal nature of the discussion.

I won’t be using those slides again. I’ll go back to just talking from notes.

Fortunately, the students were really good and involved and asked great questions, anyway. So no real harm done.

Complaints about the NIH grant review process

2 Oct 2013

Earlier this week, I met with a collaborator to discuss what to do with our NIH grant proposal, whose “A1” was “unscored” (ie, the revised version, and you don’t get a third try, received a “preliminary score” in the lower half and so was not discussed by the review panel and couldn’t be funded).

NIH proposals are typically reviewed by three people and given preliminary scores on five aspects (significance, approach, investigators, environment, innovation) and overall, and the top proposals based on those scores are discussed and scored by the larger panel.

One of the reviewers gave our proposal an 8 for “approach” (on a scale of 1-9, with 1 being good and 9 being terrible) with the following review comments:

4. Approach:

  • Well described details for mining of [data] and genotyping of [subjects].


  • There is no power analysis for Aim 2. Without knowing which and how many [phenotypes] will be evaluated it is not possible to estimate the statistical power.

Valid comments, but is that really all the reviewer had to say? What about Aims 1 and 3, or the other aspects of Aim 2? That is totally fucking inadequate.

Looking at this review again, I was reminded of how much I despise many aspects of the NIH review process. So it’s led me, finally, to write down some of the things that annoy me.

Department websites

11 Sep 2013

I was thinking about department websites, partly because my own department’s website is terrible, and recently a colleague asked me whether I could suggest some good department sites.

I’ll describe the basic principles for a good department website, and then I’ll comment on a number of examples.

But first: No discussion of academic web pages is complete without referring to the xkcd comic on University websites, so let’s start with that:

xkcd comic: University Websites

$18 for a two page PDF? I still don’t get it.

2 May 2013

Yesterday, I saw this tweet by @Ananyo

Time that biologists stopped telling the public oversimplistic fairy tales on Darwinian evolution, says P Ball ($)…

So I clicked the link to the Nature paper and realized, “Oh, yeah. I’ve got to enter through the UW library website.”

But then I thought, “Wait…$18 for a two-page Nature comment? WTF?”

So I tweeted:

DNA: Celebrate the unknowns, like this Nature comment, which costs $18.…

And thinking about it some more, I got more annoyed, and tweeted:

Why do publishers charge such high per-article fees? At $18/artcl, you’d have to be desperate or stupid to pay; at $1-2, prob’ly lots would.

And then I thought, I’ll ask Nature directly:

@NatureMagazine Why is the per-article charge so high? It seems like you’d make more profit at $2/article.

And they responded:

@kwbroman For a while now, individual papers can be rented through @readcube for $3-5. A full tablet subscription to Nature costs $35.

But that didn’t quite answer my question. So I asked:

.@NatureMagazine So is the $18 charge for a 2 pg PDF just to discourage piracy?

I thought a lot about whether to put “piracy” in quotes or not, or whether to write “copyright infringement” instead.

But anyway, they responded:

@kwbroman just as with any product, the more you buy, the more you save. Media/publishing subscriptions have worked this way for decades.

That again didn’t quite answer my question.

It’s a scam

I still don’t understand the $18 business. It’s not “The more you buy, the more you save.” It’s, “Buy the whole season for $35, or buy 5 min from Episode 1 for $18.”

I understand that the cover price of Wired is $5 per issue, while I could get a year’s subscription for $15-20. But that’s not the same as $18 for one article vs $200 per year.

The $18 for a two-page PDF is like 900 numbers and paycheck advances. These are scams taking advantage of desperate or stupid people.

If they don’t want to sell the PDFs for individual articles for a reasonable price, they should just not sell them at all.

Methods before results

29 Apr 2013

It’s great that, in a step towards improved reproducibility, the Nature journals are removing page limits on Methods sections:

To allow authors to describe their experimental designs and methods in enough detail for others to interpret and replicate them, the participating journals are removing length restrictions on Methods sections.

But couldn’t they include the Methods section in the pdf for the article? For example, consider this article in Nature Genetics; the Methods section is only available in the html version of the paper. The PDF says:

Methods and any associated references are available in the online version of the paper.

Methods are important.

  • They shouldn’t be separated from the main text.
  • They shouldn’t be placed after the results (as so many journals, including PLoS, do).
  • They shouldn’t be in a smaller font than the main text (as PNAS does).
  • They certainly shouldn’t be endnotes (as Science used to do).

Supplements annoy me too

I love supplemental material: authors can give the full details, and they can provide as many supplemental figures and tables as they want.

But supplements can be a real pain.

  • I don’t want to have to click on 10 different links. Put it all in one document.
  • I don’t want to have to open Word. Put text and figures in a PDF.
  • I don’t want to have to open Excel. Put data in a plain text file, preferably as part of a git repository with related code.

At least supplements are now included at the journal sites!

This paper in Bioinformatics refers to a separate site for supplemental information:

Expression data and supplementary information are available at

But doesn’t exist anymore. I was able to find the supplement using the Wayback Machine, but

  • The link in the paper was wrong: It should be .html not .htm
  • The final version on Wayback has a corrupted PDF, though one can go back to previous versions that are okay.

I like Genetics and G3

Genetics and G3 put the Methods where they belong (before the results), and when you download the PDF for an article in Genetics, it includes the supplement. For a G3 article, the supplement isn’t included in the article PDF, but at least you can the whole supplement as a single PDF.

For example, consider my recent Genetics articles:

If you click on “Full Text (PDF),” you get the article plus the 3 supplemental figures and 23 supplemental tables in the former case, and article plus the 17 supplemental figures and 2 supplemental tables in the latter case.