Archive for April, 2013

Methods before results

29 Apr 2013

It’s great that, in a step towards improved reproducibility, the Nature journals are removing page limits on Methods sections:

To allow authors to describe their experimental designs and methods in enough detail for others to interpret and replicate them, the participating journals are removing length restrictions on Methods sections.

But couldn’t they include the Methods section in the pdf for the article? For example, consider this article in Nature Genetics; the Methods section is only available in the html version of the paper. The PDF says:

Methods and any associated references are available in the online version of the paper.

Methods are important.

  • They shouldn’t be separated from the main text.
  • They shouldn’t be placed after the results (as so many journals, including PLoS, do).
  • They shouldn’t be in a smaller font than the main text (as PNAS does).
  • They certainly shouldn’t be endnotes (as Science used to do).

Supplements annoy me too

I love supplemental material: authors can give the full details, and they can provide as many supplemental figures and tables as they want.

But supplements can be a real pain.

  • I don’t want to have to click on 10 different links. Put it all in one document.
  • I don’t want to have to open Word. Put text and figures in a PDF.
  • I don’t want to have to open Excel. Put data in a plain text file, preferably as part of a git repository with related code.

At least supplements are now included at the journal sites!

This paper in Bioinformatics refers to a separate site for supplemental information:

Expression data and supplementary information are available at
http://www.rii.com/publications/2003/HE_SDS.htm.

But rii.com doesn’t exist anymore. I was able to find the supplement using the Wayback Machine, but

  • The link in the paper was wrong: It should be .html not .htm
  • The final version on Wayback has a corrupted PDF, though one can go back to previous versions that are okay.

I like Genetics and G3

Genetics and G3 put the Methods where they belong (before the results), and when you download the PDF for an article in Genetics, it includes the supplement. For a G3 article, the supplement isn’t included in the article PDF, but at least you can the whole supplement as a single PDF.

For example, consider my recent Genetics articles:

If you click on “Full Text (PDF),” you get the article plus the 3 supplemental figures and 23 supplemental tables in the former case, and article plus the 17 supplemental figures and 2 supplemental tables in the latter case.

Use meaningful URLs

10 Apr 2013

QR codes are stupid. See the well-known flowchart.

And I don’t like Drupal. Sites that use it give things URLs like http://www.genetics.wisc.edu/node/577 for their seminar list.

And can we get rid of the www?

“What’s your web site?”

“double-u double-u double-u …”

“Zzz…”

URLs should be meaningful and short. I like deep hierarchies of folders, but it makes for long URLs.

URL-shorteners help, but you don’t really want to read out (or type) one of those short URLs. And they tell you nothing about where they’re going.

What you want is something like bcaffo.com or stodden.net. Or rqtl.org.

But…I guess you could just say “I’ll send you an email.”

And customize <title>

And while I have your attention, note that the title of your web page shows up on Google (and at the top of the browser).

It’s nice to see others make use of my html code, but you shouldn’t leave my name in the title of your publication page.

Put the important words first (not like the title for my “official page”), and perhaps nothing else. For example, the title shouldn’t include “Drupal”.

Update: Read this: “URLs are for People, not Computers

I could have just given the URL:
http://www.not-implemented.com/urls-are-for-people-not-computers

Knuth: Journal referees should assist authors

8 Apr 2013

When serving as referee for a journal, who are you working for?

  • The editor: Will the paper add to the journal’s prestige?
  • The reader: Is it worth reading?
  • The author: How can it be improved?

I’d long thought that the referee’s duty was to the journal editors and then to the readers.

But Donald Knuth’s comments on refereeing persuaded me that I should focus primarily on helping the author to improve the manuscript.

See pages 31-35 (as numbered; actually 33-37 in the pdf) in his notes on mathematical writing. And here’s the missing page on “Hints for referees”.

Even a terrible manuscript can be published, if the author is sufficiently persistent. Your primary job as referee should be to help the author to make it as good as it can be.

Almost immediately after I first read Donald Knuth’s comments (back in 2002), I received one of the worst manuscripts I’ve ever read. It was one of those cases where I really wish the authors were anonymous, because I can’t forget who was responsible for it.

It was hard for me to say, “You have no idea what you’re doing” in a constructive way. (“You should abandon this manuscript“ is not constructive, but it could be good advice. The scientific literature could use a bit more self-censorship.)

And I’ve learned to use the “Comments to the editor” as my opportunity to vent. (I would pity the poor editor on the other end, but she/he sent the thing to me!) I’d give an example of my venting, but I think I’ll leave that to another time.

Data science is statistics

5 Apr 2013

When physicists do mathematics, they don’t say they’re doing “number science”. They’re doing math.

If you’re analyzing data, you’re doing statistics. You can call it data science or informatics or analytics or whatever, but it’s still statistics.

If you say that one kind of data analysis is statistics and another kind is not, you’re not allowing innovation. We need to define the field broadly.

You may not like what some statisticians do. You may feel they don’t share your values. They may embarrass you. But that shouldn’t lead us to abandon the term “statistics”.

Beware of grep with a list

3 Apr 2013

Another R tip: beware of as.character applied to a list.

> as.character( list(letters[1:3], letters[4:6]) )
[1] "c(\"a\", \"b\", \"c\")" "c(\"d\", \"e\", \"f\")"

Really, beware of grep with a list:

> grep("c", list(letters[1:3], letters[4:6]))
[1] 1 2

You might have thought that the result would be just 1, but grep expects a vector of character strings. If the input is not that, it uses as.character(). Since the result of that starts with "c(", grep finds "c" in each.

See the related discussion (from Sept 2011) on stackoverflow.

apply vs for

2 Apr 2013

It’s widely understood that, in R programming, one should avoid for loops and always try to use apply-type functions.

But this isn’t entirely true. It may have been true for Splus, back in the day: As I recall, that had to do with the entire environment from each iteration being retained in memory.

Here’s a simple example:

> x <- matrix(rnorm(4000*40000), ncol=4000)

> system.time({
+     mx <- rep(NA, nrow(x))
+     for(i in 1:nrow(x)) mx[i] <- max(x[i,])
+  })
   user  system elapsed 
  3.719   0.446   4.164

> system.time(mx2 <- apply(x, 1, max))
   user  system elapsed 
  5.548   1.783   7.333

There’s a great commentary on this point by Uwe Ligges and John Fox in the May, 2008, issue of R News (see the “R help desk”, starting on page 46, and note that R News is now the R Journal).

Also see the related discussion at stackoverflow.

They say that apply can be more readable. It can certainly be more compact, but I usually find a for loop to be more readable, perhaps because I’m a C programmer first and an R programmer second.

A key point, from Ligges and Fox: “Initialize new objects to full length before the loop, rather than increasing their size within the loop.”

x[[c(5,3)]]

2 Apr 2013

An R tip:
Did you know that x[[c(5,3)]] is the same as x[[5]][[3]]?

I should make more thorough use of this.

In the help file for [[:

[[ can be applied recursively to lists, so that if the single index i is a vector of length p, alist[[i]] is equivalent to alist[[i1]]...[[ip]] providing all but the final indexing results in a list.

I never knew this; I came across it when playing around (i.e., not paying proper attention) in the back of the room at an R course.

Did you know that [[ had a help file? Type ?"[["

Thoughts on statistical consulting

2 Apr 2013

The Statistics Department at UW-Madison has a course on statistical consulting, offered each semester. I’m often asked to give a lecture, which I do in an informal way: summarizing my experiences and answering questions.

I thought it might be useful write my thoughts on statistical consulting here: why, how, and difficulties. This will be a bit rough, and long. I’ll revert to bullet points, to be more compact (and because I’m lazy).

(more…)