Posts Tagged ‘data analysis’

Initial steps towards reproducible research

4 Dec 2014

In anticipation of next week’s Reproducible Science Hackathon at NESCent, I was thinking about Christie Bahlai’s post on “Baby steps for the open-curious.”

Moving from Ye Olde Standard Computational Science Practice to a fully reproducible workflow seems a monumental task, but partially reproducible is better than not-at-all reproducible, and it’d be good to give people some advice on how to get started – to encourage them to get started.

So, I spent some time today writing another of my minimal tutorials, on initial steps towards reproducible research.

It’s a bit rough, and it could really use some examples, but it helped me to get my thoughts together for the Hackathon and hopefully will be useful to people (and something to build upon).

“My” chromosome 8p inversion

8 May 2013

There was lots of discussion on twitter yesterday about Graham Coop’s paper with Peter Ralph (or vice versa), on The geography of recent genetic ancestry across Europe, particularly regarding the FAQ they’d created.

I was eager to take a look, and, it’s slightly embarrassing to say, I first did a search to see if they’d made a connection to any of my work. (I’m probably not the only one to do that.) Sure enough, they cited a paper of mine, but it was Giglo et al. (2001) Am J Hum Genet 68: 874–883, on “my” chr 8p inversion, and not what I’d expected, my autozygosity paper.

What did the chr 8p inversion have to do with this? Search for “[36]” and you’ll find:

We find that the local density of IBD blocks of all lengths is relatively constant across the genome, but in certain regions the length distribution is systematically perturbed (see Figure S1), including around certain centromeres and the large inversion on chromosome 8 [36], also seen by [35].

The chr 8p inversion presents an interesting data analysis story from my postdoc years. In a nutshell: I was studying human crossover interference, found poor model fit for maternal chr 8 that was due to tight apparent triple-crossovers in two individuals in each of two families, hypothesized that there was an inversion in the region, but it would have to be both long and with both orientations being common. The inversion was confirmed via FISH, and it’s something like 5 Mbp long, with the frequencies of the two orientations being 40 and 60% in people of European ancestry.

(more…)

Data science is statistics

5 Apr 2013

When physicists do mathematics, they don’t say they’re doing “number science”. They’re doing math.

If you’re analyzing data, you’re doing statistics. You can call it data science or informatics or analytics or whatever, but it’s still statistics.

If you say that one kind of data analysis is statistics and another kind is not, you’re not allowing innovation. We need to define the field broadly.

You may not like what some statisticians do. You may feel they don’t share your values. They may embarrass you. But that shouldn’t lead us to abandon the term “statistics”.

5/9 data journalists use Excel

15 Jan 2013

…if I’ve counted correctly.

I’d not heard the term “data journalist” before. It’s nice that they have an open handbook.