Car crash stats revisited: My measurement errors

3 Nov 2014

Last week, I created revised versions of graphs of car crash statistics by state (including an interactive version), from a post by Mona Chalabi at 538.

Since I was working on those at the last minute in the middle of the night, to be included as an example in a lecture on creating effective figures and tables, I just read the data off printed versions of the bar charts, using a ruler.

I later emailed Mona Chalabi, and she and Andrew Flowers quickly posted the data to github.com/fivethirtyeight/data. (That repository has a lot of interesting data, and if you see data at 538 that you’re interested in, just ask them!)

I was curious to look at how I’d done with my measurements and data entry. Here’s a plot of my percent errors:

Percent measurement errors in Karl's car crash stats

Not too bad, really. Here are the biggest problems:

  • Mississippi, non-distracted: off by 6%, but that corresponded to 0.5 mm.
  • Rhode Island and Ohio, speeding: off by 40 and 35%, respectively. I’d written down 8 and 9 mm rather than 13 and 14 mm.
  • Maine and Indiana, alcohol: wrote 15.5 and 14.5 mm, but typed 13.5 and 13 mm. In the former, I think I just misinterpreted my writing; in the latter, I think I wrote the number for the state below (Iowa).

It’s also interesting to note that my “total” and “non-distracted” were almost entirely under-estimates: probably an error in the measurement of the overall width of the bar chart.

Also note: @brycem had recommended using WebPlotDigitizer for digitizing data from images.

Interactive plot of car crash stats

30 Oct 2014

I spent the afternoon making a D3-based interactive version of the graphs of car crash statistics by state that I’d discussed yesterday: my attempt to improve on the graphs in Mona Chalabi‘s post at 538.

Screen shot of interactive graph of car crash statistics

See it in action here.

Code on github.

Scholarly Publishing Symposium at UW-Madison

30 Oct 2014

At the Scholarly Publishing Symposium at UW-Madison today. Has interesting list of supplemental materials, but apparently only on paper:

Supplemental materials from UW-Madison Scholarly Publishing Symposium

So here they are electronically.

Improved graphs of car crash stats

29 Oct 2014

Last week, Mona Chalabi wrote an interesting post on car crash statistics by state, at fivethirtyeight.com.

I didn’t like the figures so much, though. There were a number of them like this:

chalabi-dearmona-drinking

I’m giving a talk today about data visualization [slides | github], and I thought this would make a good example, so I spent some time creating versions that I like better.
Read the rest of this entry »

Error notifications from R

4 Sep 2014

I’m enthusiastic about having R notify me when my script is done.

But among my early uses of this, my script threw an error, and I never got a text or pushbullet about that. And really, I’m even more interested in being notified about such errors than anything else.

It’s relatively easy to get notified of errors. At the top of your script, include code like options(error = function() { } )

Fill in the function with your notification code. If there’s an error, the error message will be printed and then that function will be called. (And then the script will halt.)

You can use geterrmessage() to grab the error message to include in your notification.

For example, if you want to use RPushbullet for the notification, you could put, at the top of your script, something like this:

options(error = function() { 
                    library(RPushbullet)
                    pbPost("note", "Error", geterrmessage())
                })

Then if the script gives an error, you’ll get a note with title “Error” and with the error message as the body of the note.

Update: I knew I’d heard about this sort of thing somewhere, but I couldn’t remember where. Duh; Rasmus mentioned it on twitter just a couple of days ago! Fortunately, he reminded me of that in the comments below.

Notifications from R

3 Sep 2014

You just sent a long R job running. How to know when it’s done? Have it notify you by beeping, sending you a text, or sending you a notification via pushbullet.

Read the rest of this entry »

The mustache photo

28 Aug 2014

A certain photo of me has been following me around for some time.

Karl with a mustache, 15 Nov 2002

The thing is sitting on my website, so I suppose I have only myself to blame. I actually quite like the photo. I look happy. I was happy. I’m not always happy.

Read the rest of this entry »

Yet another R package primer

28 Aug 2014

Hadley Wickham is writing what will surely be a great book about the basics of R packages. And Hilary Parker wrote a very influential post on how to write an R package. So it seems like that topic is well covered.

Nevertheless, I’d been thinking for some time that I should write another minimal tutorial with an alliterative name, on how to turn R code into a package. And it does seem valuable to have a diversity of resources on such an important topic. (R packages are the best way to distribute R code, or just to keep track of your own personal R code, as part of a reproducible research process.)

So I’m going ahead with it, even though it doesn’t seem necessary: the R package primer.

It’s not completely done, but the basic stuff is there.

If I could do it over again, I’d self-publish

12 Aug 2014

In 2009, Ĺšaunak Sen and I wrote a book about QTL mapping and the R/qtl software. We started working on it in the fall of 2006, and it was a heck of a lot of work.

We’d talked to several publishers, and ended up publishing with Springer. John Kimmel was the editor we worked with; I like John, and I felt that Springer (or John) did a good job of keeping prices reasonable. We were able to publish in full color with a list price of $99, so that on Amazon it was about $65. (In April, 2013, there was a brief period where it was just $42 at Amazon!)

Springer did arrange several rounds of reviews; they typically pay reviewers $100 or a few books. But the copy editing was terrible (at the very least, you want a copy editor to read the book, and it was pretty clear that our copy editor hadn’t), and the actual type-setting and construction of the index was left to us, the authors.

It feels nice to have written a proper book, but I don’t think it makes that big of a difference, for me or for readers.

And John Kimmel has since left Springer to go to Chapman & Hall/CRC, and Springer has raised the price of our book to $169, so it’s now selling for $130 at Amazon. I think that’s obnoxious. It’s not like they’ve gone back and printed extra copies, so it’s hard to see how their costs could have gone up. But in the publishing agreement we signed, we gave Springer full rights to set the price of the book.

I have a hard time recommending the book at that price; I’m tempted to help people find pirated PDFs online. (And seriously, if you can’t find a pirated copy, you should work on your internet skills.)

I corresponded with an editor at Springer, on why our book has become so expensive and whether there’s anything we can do about it. They responded

  • If we do a new edition, it could be listed as $129.
  • If the book is adopted by university classes, “the pricing grid it is based on would have lower prices.”
  • Our book is available electronically, for purchase by chapter as well.

Purchase by chapter? Yeah, for $30 per chapter!

Springer has published books and allowed the authors to post a PDF, but only for really big sellers, and ours is definitely not in that category.

I’m both disgusted and embarrassed by this situation. If I could do it all over again, I’d self-publish: post everything on the web, and arrange some way for folks to have it printed cheaply.

Testing an R package’s interactive graphs

1 Aug 2014

I’ve been working on an R package, R/qtlcharts, with D3-based interactive graphs for quantitative trait locus mapping experiments.

Testing the interactive charts it produces is a bit of a pain. It seems like I pretty much have to just open a series of examples in a web browser and tab through them manually, checking that they look okay, that the interactions seem to work, and that they’re not giving any sort of errors.

But if I want to post the package to CRAN, it seems (from the CRAN policy) that the examples in the .Rd files shouldn’t be opening a web browser. Thus, I need to surround the example code with \dontrun{}.

But I was using those examples, and R CMD check, to open the series of examples for manual checking.

So, what I’ve decided to do:

  • Include examples opening a browser, but within \dontrun{} so the browser isn’t opened in R CMD check.
  • Also include examples that don’t open the browser, within \dontshow{}, so that R CMD check will at least check the basics.
  • Write a ruby script that pulls out all of the examples from the .Rd files, stripping off the \dontrun{} and \dontshow{} and pasting it all into a .R file.
  • Periodically run R CMD BATCH on that set of examples, to do the manual checking of the interactive graphs.

This will always be a bit of a pain, but with this approach I can do my manual testing in a straightforward way and still fulfill the CRAN policies.

Update: Hadley Wickham pointed me to \donttest{}, added in R ver 2.7 (in 2008). (More value from blog + twitter!)

So I replaced my \dontrun{} bits with \donttest{}. And I can use devtools::run_examples() to run all of the examples, for my manual checks.


Follow

Get every new post delivered to your Inbox.

Join 93 other followers