Posts Tagged ‘D3’

MongoDB with D3.js

22 Jun 2015

I consider interactive data visualization to be the critical tool for exploration of high-dimensional data.

That’s led me to spend a good amount of time in the last few years learning some new skills (D3 and CoffeeScript) and developing some new tools, particularly the R package R/qtlcharts, which provides interactive versions of the many data visualizations in R/qtl, my long-in-development R package for mapping genetic loci (called quantitative trait loci, QTL) that underlie complex trait variation in experimental organisms.

R/qtlcharts is rough in spots, and while it works well for moderate-sized data sets, it can’t well handle truly large-scale data, as it just dumps all of the data into the file viewed by a web browser.

For large-scale data, one needs to dynamically load slices of the data based on user interactions. It seems best to have a formal database behind the scenes. But I think I’m not unusual, among statisticians, in having almost no experience working with databases. My collaborators tend to keep things in Excel. Even for quite large problems, I keep things in flat files.

So, I’ve been trying to come to understand the whole database business, and how I might use one for larger-scale data visualizations. I think I’ve finally made that last little conceptual step, where I can see what I need to do. I made a small illustration in my d3examples repository on GitHub.

(more…)

Advertisements

Interactive plot of car crash stats

30 Oct 2014

I spent the afternoon making a D3-based interactive version of the graphs of car crash statistics by state that I’d discussed yesterday: my attempt to improve on the graphs in Mona Chalabi‘s post at 538.

Screen shot of interactive graph of car crash statistics

See it in action here.

Code on github.

Testing an R package’s interactive graphs

1 Aug 2014

I’ve been working on an R package, R/qtlcharts, with D3-based interactive graphs for quantitative trait locus mapping experiments.

Testing the interactive charts it produces is a bit of a pain. It seems like I pretty much have to just open a series of examples in a web browser and tab through them manually, checking that they look okay, that the interactions seem to work, and that they’re not giving any sort of errors.

But if I want to post the package to CRAN, it seems (from the CRAN policy) that the examples in the .Rd files shouldn’t be opening a web browser. Thus, I need to surround the example code with \dontrun{}.

But I was using those examples, and R CMD check, to open the series of examples for manual checking.

So, what I’ve decided to do:

  • Include examples opening a browser, but within \dontrun{} so the browser isn’t opened in R CMD check.
  • Also include examples that don’t open the browser, within \dontshow{}, so that R CMD check will at least check the basics.
  • Write a ruby script that pulls out all of the examples from the .Rd files, stripping off the \dontrun{} and \dontshow{} and pasting it all into a .R file.
  • Periodically run R CMD BATCH on that set of examples, to do the manual checking of the interactive graphs.

This will always be a bit of a pain, but with this approach I can do my manual testing in a straightforward way and still fulfill the CRAN policies.

Update: Hadley Wickham pointed me to \donttest{}, added in R ver 2.7 (in 2008). (More value from blog + twitter!)

So I replaced my \dontrun{} bits with \donttest{}. And I can use devtools::run_examples() to run all of the examples, for my manual checks.

Data structures are important

19 Mar 2013

I’ve created another D3 example, of QTL analysis for a phenotype measured over time. (Click on the image for the interactive version.)

QTL analysis with phenotype over time

The code is on github. It took me about a day.

The hardest part was figuring out the right data structures. A pixel here is linked to curves over there and over there and over there. You need to set things up so it’s easy to traverse such linkages.

If you hover over a point in the top-left image, you get views of the vertical and horizontal cross-sections. If you click on a point, pointwise confidence bands are added to the “QTL effect” plot. (You have to click, because if I included those confidence bands automatically, the graph became painfully slow to refresh.)

I’m not completely happy with the layout of the graph; it’s not particularly intuitive.

Why aren’t all of our graphs interactive?

16 Mar 2013

I’ve come to believe that, for high-dimensional data, visualizations (aka graphs), and particularly interactive graphs, can be more important than precise statistical inference.

We first need to be able to view and explore the data, and when it is unusually abundant, that is especially hard. This was a primary contributor to my recent embarrassments, in which clear problems in the data were not discovered when they should have been.

I gave a talk on interactive graphs (with the title above) at Johns Hopkins last fall, and then a related talk at ENAR earlier this week, and I have a few thoughts to add here.

A brief digression

I’m giving a talk at a plant breeding symposium at Kansas State in a couple of weeks, and I’ve been pondering what to talk about. A principal problem is that I don’t really work on plant breeding. My most relevant talks are a bit too technical, and my more interesting talks are not relevant.

Then I had the idea to talk about some of my recent work with my graduate student, Il-youp Kwak, on the genetic analysis of phenotypes measured over time.

I realized that I could incorporate some interactive graphs into the talk. Initially I was just thinking that the interactive graphs would make the talk more interesting and would allow me to talk about things that weren’t necessarily relevant but were interesting to me.

But then I realized that this work really cries out for interactive graphs. And as I begin to construct one of them, I thought of a whole bunch more I might create. More importantly, I realized that these interactive graphs are extremely useful teaching tools.

More D3 examples

Here’s an image of first graph I created for the talk; click on it to jump to the interactive version.

Statisticians are often confronted with a large set of curves. We’d like to show the individual curves, but there are too many. The resulting spaghetti plot is a total mess. An image plot (like the lasagna plot) allows us to see all of the curves, but it can be hard to get a sense of what the actual curves look like. The interactive version solves the problem.

Many curves

Here’s a second example; again click on the image to jump to the interactive version. (I’ve shown this before, but I want to use it to make another point.)

Typically, in a lecture on complex trait analysis, I’d show one LOD curve (like the top panel in the image below) and a few different plots of phenotype vs genotype (the lower-right panel in the image). I think the exploratory tool will be much more effective, in a lecture, for explaining what it all means.

LOD and QTL effects

Statisticians need to be doing this routinely

In constructing a graph, one must make some difficult choices. For high-dimensional data, one must greatly compress the available information. The resulting summaries, while potentially informative, take one far away from the original data.

Interactive graphs provide a means through which one may view the overall summary but have immediate access to the underlying details.

Interactive eQTL plot with d3.js

6 Mar 2013

I just finished an interactive eQTL plot using D3, in preparation for my talk on interactive graphics at the ENAR meeting next week.

Static view of interactive eQTL plot

The code (in CoffeeScript) is available at github. But beware: it’s pretty awful.

The hardest part was setting up the data files. Well, that plus the fact that I just barely know what I’m doing in D3.

charset="utf-8"

2 Mar 2013

To use the latest version of D3, you need to use charset="utf-8" in the call to <script>.

I’m giving a talk at ENAR in just over a week, on interactive graphics. My slides (still in preparation) are on the web.

The slides were working fine locally on my laptop, but they weren’t working on my web server…I was getting a syntax error regarding an illegal character.

I figured out that I needed to add charset="utf-8", like so:

<script charset="utf-8" type="text/javascript" src="js/d3.js">
</script>

D3.js difficulties

8 Feb 2013

I’m pleased with my progress learning javascript and D3. (I’m actually writing coffeescript rather than javascript.)

But I spent a lot of time thrashing about yesterday, due mostly to two silly errors.

Put the script in the body

First, I’d tried to make a truly simple example, making just an SVG with a little rectangle.

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="utf-8">
    <title>Put script in body</title>
    <script type="text/javascript" 
        src="http://d3js.org/d3.v3.min.js"></script>

    <script type="text/javascript">
    var h=50;
    var svg = d3.select("body").append("svg")
                .attr("height", h).attr("width", h);
    svg.append("rect").attr("x", 0).attr("y", 0)
                .attr("height", h).attr("width",h);
    </script>
</head>

<body>
</body>
</html>

But this gives nothing. You need to move the script from the head to the body, as it is here. Then it works.

I don’t really understand this. Perhaps I should go back to my reading.

[Update: I think I’ve figured this out. If you put the script in the head, the code gets run before the body exists, and so there’s no body in which to append the SVG. Conclusion: Put the link to your script at the very bottom of the html file.]

Don’t forget enter()

In D3, you bind data to a bunch of objects, and attributes of the objects can be controlled by features of the data. I write code sort of like this:

svg.append("g").selectAll("empty")
   .data(thedata)
   .enter()
   .append("rect")
   .attr("x", (d) -> start[d])
   .attr("y", pad.top)
   .attr("width", (d) -> end[d] - start[d])
   .attr("height", (d) -> hInner)
   .attr("fill", (d) -> color[d])
   .attr("stroke", "none")

My most common mistake so far: I forget the .enter() part. You don’t get a error message, but the objects don’t get created.

Interactive graphics with d3.js

8 Feb 2013

I’m making some progress learning D3 (for interactive graphics), by which I mean I’ve gotten a couple of examples to work.

Many box plots

First, an example for displaying many distributions. Here I’m considering a set of nearly 500 gene expression microarrays, each with 40,000 or so measurements. It’s hard to look at 500 box plots side-by-side, and with 40k measurements, traditional box plots don’t give enough information about the tails.

Many box plots

In the top figure, the 500 arrays are aligned next to each other, sorted by their median, and then I show the 1, 5, 10, 25 50, …, 99th percentiles. The advantage of the interactive plot is that you can hover over a given array on the top and see a more detailed histogram below. And if you click on an array, its histogram will be retained below, for easy comparison to other arrays.

LOD curves and QTL effects

As a second example, I plot the LOD curves from QTL analysis across the genome; click on a chromosome above and you get a more detailed view of that chromosome in the bottom-left; click on a marker position in the bottom-left, and you get a view of the QTL effect on the bottom-right.

LOD curves and QTL effects

Likely none of that is understandable; let me try to explain. QTL stands for “quantitative trait locus,” a region of the genome (i.e., locus) that influences some quantitative trait (like insulin level in serum). To identify QTL, we look at the association between the quantitative trait and genotype at each of many genetic markers across the genome. We’re basically doing analysis of variance, but we express the results as a log10 likelihood ratio, called the LOD score.

The advantage of this interactive graph is that you have some ability to look at the underlying genotype/phenotype association, rather than just rely on LOD curves. It would be nice to include the option of a dot-plot on the lower right, rather than just the within-group averages.