Posts Tagged ‘graphics’

Halloween 2016 count

31 Oct 2016

Here’s a graph of the numbers of trick-or-treat-ers we saw this evening, by time. 10 of the 25 kids arrived in one big group. (Compare this to our 2011 experience.)

Halloween 2016 count

Advertisements

MongoDB with D3.js

22 Jun 2015

I consider interactive data visualization to be the critical tool for exploration of high-dimensional data.

That’s led me to spend a good amount of time in the last few years learning some new skills (D3 and CoffeeScript) and developing some new tools, particularly the R package R/qtlcharts, which provides interactive versions of the many data visualizations in R/qtl, my long-in-development R package for mapping genetic loci (called quantitative trait loci, QTL) that underlie complex trait variation in experimental organisms.

R/qtlcharts is rough in spots, and while it works well for moderate-sized data sets, it can’t well handle truly large-scale data, as it just dumps all of the data into the file viewed by a web browser.

For large-scale data, one needs to dynamically load slices of the data based on user interactions. It seems best to have a formal database behind the scenes. But I think I’m not unusual, among statisticians, in having almost no experience working with databases. My collaborators tend to keep things in Excel. Even for quite large problems, I keep things in flat files.

So, I’ve been trying to come to understand the whole database business, and how I might use one for larger-scale data visualizations. I think I’ve finally made that last little conceptual step, where I can see what I need to do. I made a small illustration in my d3examples repository on GitHub.

(more…)

Car crash stats revisited: My measurement errors

3 Nov 2014

Last week, I created revised versions of graphs of car crash statistics by state (including an interactive version), from a post by Mona Chalabi at 538.

Since I was working on those at the last minute in the middle of the night, to be included as an example in a lecture on creating effective figures and tables, I just read the data off printed versions of the bar charts, using a ruler.

I later emailed Mona Chalabi, and she and Andrew Flowers quickly posted the data to github.com/fivethirtyeight/data. (That repository has a lot of interesting data, and if you see data at 538 that you’re interested in, just ask them!)

I was curious to look at how I’d done with my measurements and data entry. Here’s a plot of my percent errors:

Percent measurement errors in Karl's car crash stats

Not too bad, really. Here are the biggest problems:

  • Mississippi, non-distracted: off by 6%, but that corresponded to 0.5 mm.
  • Rhode Island and Ohio, speeding: off by 40 and 35%, respectively. I’d written down 8 and 9 mm rather than 13 and 14 mm.
  • Maine and Indiana, alcohol: wrote 15.5 and 14.5 mm, but typed 13.5 and 13 mm. In the former, I think I just misinterpreted my writing; in the latter, I think I wrote the number for the state below (Iowa).

It’s also interesting to note that my “total” and “non-distracted” were almost entirely under-estimates: probably an error in the measurement of the overall width of the bar chart.

Also note: @brycem had recommended using WebPlotDigitizer for digitizing data from images.

Interactive plot of car crash stats

30 Oct 2014

I spent the afternoon making a D3-based interactive version of the graphs of car crash statistics by state that I’d discussed yesterday: my attempt to improve on the graphs in Mona Chalabi‘s post at 538.

Screen shot of interactive graph of car crash statistics

See it in action here.

Code on github.

Improved graphs of car crash stats

29 Oct 2014

Last week, Mona Chalabi wrote an interesting post on car crash statistics by state, at fivethirtyeight.com.

I didn’t like the figures so much, though. There were a number of them like this:

chalabi-dearmona-drinking

I’m giving a talk today about data visualization [slides | github], and I thought this would make a good example, so I spent some time creating versions that I like better.
(more…)

Testing an R package’s interactive graphs

1 Aug 2014

I’ve been working on an R package, R/qtlcharts, with D3-based interactive graphs for quantitative trait locus mapping experiments.

Testing the interactive charts it produces is a bit of a pain. It seems like I pretty much have to just open a series of examples in a web browser and tab through them manually, checking that they look okay, that the interactions seem to work, and that they’re not giving any sort of errors.

But if I want to post the package to CRAN, it seems (from the CRAN policy) that the examples in the .Rd files shouldn’t be opening a web browser. Thus, I need to surround the example code with \dontrun{}.

But I was using those examples, and R CMD check, to open the series of examples for manual checking.

So, what I’ve decided to do:

  • Include examples opening a browser, but within \dontrun{} so the browser isn’t opened in R CMD check.
  • Also include examples that don’t open the browser, within \dontshow{}, so that R CMD check will at least check the basics.
  • Write a ruby script that pulls out all of the examples from the .Rd files, stripping off the \dontrun{} and \dontshow{} and pasting it all into a .R file.
  • Periodically run R CMD BATCH on that set of examples, to do the manual checking of the interactive graphs.

This will always be a bit of a pain, but with this approach I can do my manual testing in a straightforward way and still fulfill the CRAN policies.

Update: Hadley Wickham pointed me to \donttest{}, added in R ver 2.7 (in 2008). (More value from blog + twitter!)

So I replaced my \dontrun{} bits with \donttest{}. And I can use devtools::run_examples() to run all of the examples, for my manual checks.

2014 UseR conference, days 1-2

2 Jul 2014

I’m at UCLA for the UseR Conference. I attended once before, and I really enjoyed it. And I’m really enjoying this one. I’m learning a ton, and I find the talks very inspiring.

In my comments below, I give short shrift to some speakers (largely by not having attended their talks), and I’m critical in some places about the conference organization. Having co-organized a small conference last year, I appreciate the difficulties. I think the organizers of this meeting have done a great job, but there are some ways it which it might have been better (e.g., no tiny rooms, a better time slot for the posters, and more space for the posters).

(more…)

Further points on crayon colors

9 May 2014

I saw this great post on crayola crayon colors at the Learning R blog, reproducing a nice graph of the Crayola crayon colors over time. (Also see this even nicer version.)

The Learning R post shows how to grab the crayon colors from the wikipedia page, “List of Crayola crayon colors,” directly in R. Here’s the code (after some slight modifications due to changes in the page since 2010):

library(XML)
theurl <- "http://en.wikipedia.org/wiki/List_of_Crayola_crayon_colors"
crayontable <- readHTMLTable(theurl, stringsAsFactors = FALSE)[[1]]
crayons <- crayontable[,grep("Hex", colnames(crayontable))]
names(crayons) <- crayontable[,"Color"]

Comparing these to what I’d grabbed, I noted one small discrepancy on the Wikipedia page: Yellow Orange was listed as "#FFAE42" but the background color for the Yellow Orange cell in the table was "#FFB653".

So I created a Wikipedia account and edited the Wikipedia page.

(Then I realized that I’d made a mistake in my edit, undid my change, thought the whole thing through again, and edited the page again.)

The Learning R post also showed a different way to sort the colors: convert to HSV, and then sort by the H then S then V. So I edited my plot_crayons() function again, to create the following picture:

Crayon colors, again

Two more points about crayon colors

8 May 2014

If you want to use crayon colors in R but you don’t want to rely on my R/broman package, you can just grab the code. Copy the relevant lines from the R/brocolors.R file:

crayons = c("Almond"="#efdecd",
            "Antique Brass"="#cd9575",
            "Apricot"="#fdd9b5",
            ...
            "Yellow Green"="#c5e384",
            "Yellow Orange"="#ffb653")

I spent a bit of time thinking about how best to sort the colors in a meaningful way, for the plot_crayons() function. But then decided to stop thinking and just do something brainless: measure distance between colors by RMS difference of the RGB values, and then use hierarchical clustering. Here’s the code from plot_crayons():

# get rgb 
colval <- t(col2rgb(crayons))

# hclust to order the colors
ord <- hclust(dist(colval))$order

It’s not perfect, but I think it worked remarkably well:

Crayon colors

Crayon colors in R

7 May 2014

Last night I was working on a talk on creating effective graphs. Mostly, I needed to update the colors, as there’d been some gaudy ones in its previous form (e.g., slide 22).

I usually pick colors using the crayons in the Mac Color Picker. But that has just 40 crayons, and I wanted more choices.

That led me to the list of Crayola crayon colors on wikipedia. I wrote a ruby script to grab the color names and codes and added them to my R/broman package.

Use brocolors("crayons") to get the list of colors. For example, to get “Tickle Me Pink,” use

library(broman)
pink <- brocolors("crayons")["Tickle Me Pink"]

Use plot_crayons() to get the following summary plot of the colors:

Crayon colors

You can install the R/broman package using install_github in devtools, (specifically, install_github("kbroman/broman")) or wait a day or two and the version with this code will be on CRAN.

Update: See also Two more points about crayon colors.