Archive for May, 2013

More on Chutes & Ladders

20 May 2013

Matt Maenner asked about the sawtooth pattern in the figure in my last post on Chutes & Ladders.

Damn you, Matt! I thought I was done with this. Don’t feed my obsession.

My response was that if the game ends early, it’s even more likely that it’ll be the kid who went first who won. But, my intuition was wrong: exactly the opposite is true. It is the advantage to the first player that causes the sawtooth pattern, but that advantage increases with the number of rounds rather than decreases.

Numerical results

While it’s fast and easy to study the Chutes & Ladders game by simulation, if you want to answer questions more precisely, it’s best to switch to more exact results.

Consider a single individual playing the game, and let Xn be his/her location at round n. The Xn form a Markov chain, in that the future (Xn+1), given the present (Xn), is conditionally independent of the history (X1, …, Xn-1).

It’s relatively easy to construct the transition matrix of the chain. (See my R code.) This is a matrix P, with Pij = Pr(Xn+1 = j | Xn = i).

Then the probability that a player has reached state 100 by round n is
(1, 0, …, 0) Pn (0, …, 0, 1)’. That’s the cumulative distribution function (cdf) of the number of rounds for a single player to finish the game. Call this qn. You can get the probability distribution by differences, say pn = qn – qn-1.

To calculate the number of rounds to complete a game with k players, you want the minimum of k independent draws from this distribution. The probability that a game with k players is complete by round n is 1 – (1-qn)k. And again you can get the probability distributions by differences. Here’s a picture.

No. rounds to complete Chutes & Ladders

Advantage to the first player

Now, regarding the advantage to the first player: note that the first player wins in exactly n steps if he gets to the finish at n steps and none of the other players are done by n-1 steps. So, with k players, the probability that the first player wins in exactly n steps is pn (1-qn-1)k-1.

The chance that the second player wins in exactly n steps is (1-qn) pn (1-qn-1))k-2, with the last term included only if there are k > 2 players.

From this idea, it’s straightforward to calculate the probability that the first player wins given that the game is complete at round n. Here’s a plot of that probability as a function of the number of players, relative to the nominal probability (1/2, 1/3, 1/4).

Advantage to the first player in Chutes & Ladders

Note that n=7 is the minimum number of rounds to complete the game. I’d thought that the first player’s advantage went down over time, but the opposite is true.

No. spins to end the game

Combining these two results (on the number of rounds to complete the game and the probability that player i will win in n rounds), we can get a more precise version of the simulation-based figure in my last post:

No. spins to complete Chutes & Ladders, numerical results

As you can see, the sawtooth pattern becomes more pronounced with the number of rounds, but then it gets lost in the downward slope of the distribution on the right side. (Again, see my R code.)

Advertisements

Chutes & ladders: How long is this going to take?

17 May 2013

I was playing Chutes & Ladders with my four-year-old daughter yesterday, and I thought, “How long is this going to take?”

I saw an interesting mathematical analysis of the game a few years ago, but it seems to be offline, though you can read it via the wayback machine.

But that didn’t answer my specific question, namely, “How long is this going to take?”

So I wrote a bit of R code to simulate the game.

Here’s the distribution of the number of spins to complete the game, by number of players:

No. spins in chutes & ladders

With two players, the average number of spins is 52, with a 90th percentile of 88.

If you add a third player, the average increases to 65, and the 90th percentile increases to 103. You’re playing fewer rounds, but each round is three times as long. If you add a fourth player, the average is 76 and the 90th percentile is 117.

So, in trying to minimize the agony, it seems best to not encourage my eight-year-old son to join us in the game. If he plays with us, there’s a 63% chance that it will take longer.

And that’s particularly true because then the chance of my daughter winning drops from about 1/2 to about 1/3.

That raises another question: if I let her go first, what advantage does that give her? Not much. The chance that the person who goes first will win is 50.9%, 34.4%, and 25.9%, respectively, when there are 2, 3, and 4 players. So not a noticeable amount. Thus I cheat (on her behalf). Really, though, I’m cheating in order to shorten the game as much as to ensure that she wins.

Note: There’s a close connection between this problem and my work on the multiple-strain recombinant inbred lines. (See this and that.) I’m tempted to play around with it some more.

Additional numerical results here.

Stack Exchange: Why I dropped out

13 May 2013

Stack Exchange is a series of question-and-answer sites, including Stack Overflow for programming and Cross Validated for statistics. I was introduced to these sites at a short talk by Barry Rowlingson at the 2011 UseR! meeting, “Why R-help must die!“

These sites have a lot of advantages over R-help: The format is easier to read, math and code can be nicely formatted, the questions are tagged, search is easier, and there should be less redundancy.

Additional pros

  • It’s good to help people.
  • It’s fun to rack up reputation points for helping people.
  • It’s good exercise, in both thinking about statistical questions and in articulating useful answers (and there are some interesting questions).

However, some cons

So I gave up

I started spending time on stackoverflow and cross-validated soon after returning from UseR! 2011, but I lost my patience and quit within three months.

One needs to treat each question with respect, and I eventually seemed to lose my ability to sustain such goodwill. I think I take things too personally.

Update

I should clarify: I do continue to use Stack Exchange, mostly through google. Many problems I run into have already been answered. I just don’t have the right temperament to participate regularly in answering others’ questions.

Tutorials on git/github and GNU make

10 May 2013

If you’re not using version control, you should be. Learn git.

If you’re not on github, you should be. That’s real open source.

To help some colleagues get started with git and github, I wrote a minimal tutorial. There are lots of git and github resources available, but I thought I’d give just the bare minimum to get started; after using git and github for a while, other resources make a lot more sense and seem much more worthwhile.

And for R folks, note that it’s easy to install R packages that are hosted on github, using Hadley Wickham‘s devtools package. For example, to install Nacho Caballero‘s clickme package:

install.packages("devtools")
library(devtools)
install_github("clickme", "nachocab")

Having written that git/github tutorial, I thought: I should write more such!

So I immediately wrote a similar short tutorial on GNU make, which I think is the most important tool for reproducible research.

“My” chromosome 8p inversion

8 May 2013

There was lots of discussion on twitter yesterday about Graham Coop’s paper with Peter Ralph (or vice versa), on The geography of recent genetic ancestry across Europe, particularly regarding the FAQ they’d created.

I was eager to take a look, and, it’s slightly embarrassing to say, I first did a search to see if they’d made a connection to any of my work. (I’m probably not the only one to do that.) Sure enough, they cited a paper of mine, but it was Giglo et al. (2001) Am J Hum Genet 68: 874–883, on “my” chr 8p inversion, and not what I’d expected, my autozygosity paper.

What did the chr 8p inversion have to do with this? Search for “[36]” and you’ll find:

We find that the local density of IBD blocks of all lengths is relatively constant across the genome, but in certain regions the length distribution is systematically perturbed (see Figure S1), including around certain centromeres and the large inversion on chromosome 8 [36], also seen by [35].

The chr 8p inversion presents an interesting data analysis story from my postdoc years. In a nutshell: I was studying human crossover interference, found poor model fit for maternal chr 8 that was due to tight apparent triple-crossovers in two individuals in each of two families, hypothesized that there was an inversion in the region, but it would have to be both long and with both orientations being common. The inversion was confirmed via FISH, and it’s something like 5 Mbp long, with the frequencies of the two orientations being 40 and 60% in people of European ancestry.

(more…)

$18 for a two page PDF? I still don’t get it.

2 May 2013

Yesterday, I saw this tweet by @Ananyo

Time that biologists stopped telling the public oversimplistic fairy tales on Darwinian evolution, says P Ball ($) nature.com/nature/journal…

So I clicked the link to the Nature paper and realized, “Oh, yeah. I’ve got to enter through the UW library website.”

But then I thought, “Wait…$18 for a two-page Nature comment? WTF?”

So I tweeted:

DNA: Celebrate the unknowns, like this Nature comment, which costs $18. nature.com/nature/journal…

And thinking about it some more, I got more annoyed, and tweeted:

Why do publishers charge such high per-article fees? At $18/artcl, you’d have to be desperate or stupid to pay; at $1-2, prob’ly lots would.

And then I thought, I’ll ask Nature directly:

@NatureMagazine Why is the per-article charge so high? It seems like you’d make more profit at $2/article.

And they responded:

@kwbroman For a while now, individual papers can be rented through @readcube for $3-5. A full tablet subscription to Nature costs $35.

But that didn’t quite answer my question. So I asked:

.@NatureMagazine So is the $18 charge for a 2 pg PDF just to discourage piracy?

I thought a lot about whether to put “piracy” in quotes or not, or whether to write “copyright infringement” instead.

But anyway, they responded:

@kwbroman just as with any product, the more you buy, the more you save. Media/publishing subscriptions have worked this way for decades.

That again didn’t quite answer my question.

It’s a scam

I still don’t understand the $18 business. It’s not “The more you buy, the more you save.” It’s, “Buy the whole season for $35, or buy 5 min from Episode 1 for $18.”

I understand that the cover price of Wired is $5 per issue, while I could get a year’s subscription for $15-20. But that’s not the same as $18 for one article vs $200 per year.

The $18 for a two-page PDF is like 900 numbers and paycheck advances. These are scams taking advantage of desperate or stupid people.

If they don’t want to sell the PDFs for individual articles for a reasonable price, they should just not sell them at all.