Halloween 2016 count

31 Oct 2016

Here’s a graph of the numbers of trick-or-treat-ers we saw this evening, by time. 10 of the 25 kids arrived in one big group. (Compare this to our 2011 experience.)

Halloween 2016 count

My JSM 2016 itinerary

27 Jul 2016

The Joint Statistical Meetings are in Chicago next week. I thought I’d write down the set of sessions that I plan to attend. Please let me know if you have further suggestions.

First things first: snacks. Search the program for “spotlight” or “while supplies last” for the free snacks being offered. Or go to the page with the full list.

Read the rest of this entry »

Chris Walker at Faculty Senate

15 Apr 2016

Chris Walker‘s powerful speech at the Faculty Senate on 4 Apr 2016 (see “Hateful shit at UW-Madison”) was recorded!

You must listen to it!

I am a data scientist

8 Apr 2016

Three years ago this week, I wrote a blog post, “Data science is statistics”. I was fiercely against the term at that time, as I felt that we already had a data science, and it was called Statistics.

It was a short post, so I might as well quote the whole thing:

When physicists do mathematics, they don’t say they’re doing “number science”. They’re doing math.

If you’re analyzing data, you’re doing statistics. You can call it data science or informatics or analytics or whatever, but it’s still statistics.

If you say that one kind of data analysis is statistics and another kind is not, you’re not allowing innovation. We need to define the field broadly.

You may not like what some statisticians do. You may feel they don’t share your values. They may embarrass you. But that shouldn’t lead us to abandon the term “statistics”.

I still sort of feel that way, but I must admit that my definition of “statistics” is rather different than most others’ definition. In my view, a good statistician will consider all aspects of the data analysis process:

  • the broader context of a scientific question
  • study design
  • data handling, organization, and integration
  • data cleaning
  • data visualization
  • exploratory data analysis
  • formal inference methods
  • clear communication of results
  • development of useful and trustworthy software tools
  • actually answering real questions

I’m sure I missed some things there, but my main point is that most academic statisticians focus solely on developing “sophisticated” methods for formal inference, and while I agree that that is an important piece, in my experience as an applied statistician, the other aspects are often of vastly greater importance. In many cases, we don’t need to develop sophisticated new methods, and most of my effort is devoted to the other aspects, and these are generally treated as being unworthy of consideration by academic statisticians.

As I wrote in a later post, “Reform academic statistics”, we as a field appear satisfied with

  • Papers that report new methods with no usable software
  • Applications that focus on toy problems
  • Talks that skip the details of the scientific context of a problem
  • Data visualizations that are both ugly and ineffective

Discussions of Data Science generally recognize the full range of activities that are required for the analysis of data, and place greater value on such things as data visualization and software tools which are obviously important but not viewed so by many statisticians.

And so I’ve come to embrace the term Data Science.

Data Science is also a much more straightforward and understandable label for what I do. I don’t think we should need a new term, and I think we should argue against misunderstandings of Statistics rather than slink off to a new “brand”. But in general, when I talk about Data Science, I feel I can better trust that folks will understand that I am talking about the broad set of activities required in good data analysis.

If people ask me what I do, I’ll continue to say that I’m a Statistician, even though I do tend to stumble over the word. But I am also a Data Scientist.

One last thing: I’ve also come to realize that computer science folks working in computational biology are really just like me. They have expertise in a somewhat different set of tools, but then that’s true for pretty much every statistician, too: they’re much like me but they have expertise in a somewhat different set of tools. And it’s nice to be able to say that we’re all data scientists.

It should be recognized, too, that academic computer science suffers from many of the same problems that academic statistics has suffered: an overemphasis on novelty, sophistication, and toy applications, and an under-appreciation for solving real problems, for data visualization, and for useful software tools.

Action items in response to hateful shit

5 Apr 2016

UW-Madison faculty got an email update from Vice Provost and Chief Diversity Officer Patrick Sims regarding the things we can do in response to the hate and bias incidents on campus.

Here are the things he had mentioned yesterday at the Faculty Senate meeting:

  • Address hate/bias incidents in your curriculum to ameliorate unacceptable occurrences in our campus community.
  • Look at “bullying” language as a way to address possible hate/bias incidents in the classroom.
  • Commit to engaging in ongoing cultural competency training. Learning Communities for Institutional Change & Excellence (LCICE) as an infrastructure already provides these services campus-wide.
  • Commit to experiencing the leadership institute and become a facilitator, carving out 10-15% of your time towards these efforts.
  • Support the request for additional staff.
  • Visit the Campus Climate website

An attached letter from the Hate & Bias incident team added:

  • Your school/college/department can host a bystander intervention workshop on hate and bias. This workshop will provide tools for UW-Madison community members on when and how to intervene. If you would like to host a workshop, please contact Joshua Moon Johnson.
  • Many incidents go unreported for a variety of reasons. We encourage students and campus community members to report incidents of hate and bias to ensure that campus can best support the victim and work to prevent future incidents. We encourage you to post the link to report on your school/college/department websites.
  • Oftentimes students do not report incidents because they are unaware of the reporting process. To increase awareness of the reporting process, we encourage you to share brochures and posters with information on how and why it is important to report. These will be distributed across campus in the next few weeks.
  • Students who are victims of hate and bias incidents may need immediate support. Please be sure to refer/provide students with appropriate resources such as mental health/counseling services through University Health Services (UHS). The Multicultural Student Center also has drop-in hours with UHS counselors as well as support and discussions groups for students of color.
  • Many students who are victims of hate and bias incidents identify with an underrepresented racial group, gender identity or sexual orientation, or religious group. We encourage you to specifically reach out to marginalized student groups to raise awareness of the bystander intervention workshop and reporting process.

I got a reasonably positive response to my email to my faculty colleagues suggesting that we all commit to cultural competency training. But the training from the LCICE mentioned above looks to be semester-long, Tuesdays 4:30-7:30pm. I think I’ll have a difficult time convincing my colleagues of that. We need something in between nothing and 45 hours.

Hateful shit at UW-Madison

4 Apr 2016

I’m a privileged white male university professor. As privileged as they come, really. My father was a professor of chemistry; my mother also has an advanced degree in chemistry. The jobs I’ve held have been more about personal fulfillment than money: dancer, dance teacher, secretary for intellectual property lawyers, research and teaching assistant, professor. People assume I know what I’m talking about, even if I’m in shorts and a t-shirt.

All that’s just to say that, when it comes to the ongoing hateful acts that have been happening at the University of Wisconsin-Madison, I’m really the last one that you should be listening to. You should instead listen to UW students, such as the United Council of UW Students, who have submitted a list of 5 reasonable demands, or Vice Provost and Chief Diversity Officer Patrick Sims, who made an important 8-min video in response to a recent hateful incident that you should now go away and watch (really, stop reading what I have to say and spend 8 minutes watching that video), or Chris Walker, Asst Prof in the dance department, who spoke movingly today at the UW-Madison Faculty Senate meeting about the shit that faculty and students of color have to put up with on campus.

Lot’s of crap has been happening in Wisconsin lately. My focus has been on what Scott Walker and company have been doing to the state and to the University of Wisconsin, most recently by making huge cuts to state support to the UW System and by weakening tenure and shared governance.

That’s all been an embarrassment, and depressing, but in comparison to the hateful racist shit that’s been happening on campus, and Vice Provost Sims reported that there have been >30 reported hate or bias incidents on campus this year, tenure and funding just don’t seem that important.

Chris Walker’s speech at the Faculty Senate today really hammered this home. As a black man on campus, he’s experienced a lot of shit: worse shit then we’re seeing in the papers. And if we don’t fix this, our students can’t be successful. We must fix this.

What can a biostatistics professor do? I’m open to suggestions.

But for now, I’ll follow Patrick Sims’s suggestion and start with one of the United Council of UW Students’ demands:

We demand that the University of Wisconsin System creates and enforces comprehensive racial awareness and inclusion curriculum and trainings throughout all 26 UW Institution departments, mandatory for all students, faculty, staff, campus & system administration, and regents. This curriculum and training must be vetted, maintained, and overseen by a board comprised of students, staff, and faculty of color.

I’ve written an email to the faculty in my department, asking that we, as a department, volunteer to participate in such racial awareness training:

email_to_dept

Correction: There’s an error in my email; Chris Walker is Associate Professor, and has been for a couple of years.

Update: Chris Walker’s speech at the 4 Apr 2016 Faculty Senate meeting was recorded! Must listen.

Write unit tests!

7 Dec 2015

Since 2000, I’ve been working on R/qtl, an R package for mapping the genetic loci (called quantitative trait loci, QTL) that contribute to variation in quantitative traits in experimental crosses. The Bioinformatics paper about it is my most cited; also see my 2014 JORS paper, “Fourteen years of R/qtl: Just barely sustainable.”

It’s a bit of a miracle that R/qtl works and gives the right answers, as it includes essentially no formal tests. The only regular tests are that the examples in the help files don’t produce any errors that halt the code.

I’ve recently been working on R/qtl2, a reimplementation of R/qtl to better handle high-dimensional data and more complex crosses, such as Diversity Outbred mice. In doing so, I’m trying to make use of the software engineering principles that I’ve learned over the last 15 years, which pretty much correspond to the ideas in “Best Practices for Scientific Computing” (Greg Wilson et al., PLOS Biology 12(1): e1001745, doi:10.1371/journal.pbio.1001745).

I’m still working on “Make names consistent, distinctive, and meaningful”, but I’m doing pretty well on writing shorter functions with less repeated code, and particularly importantly I’m writing extensive unit tests.
Read the rest of this entry »

Fitting linear mixed models for QTL mapping

24 Nov 2015

Linear mixed models (LMMs) have become widely used for dealing with population structure in human GWAS, and they’re becoming increasing important for QTL mapping in model organisms, particularly for the analysis of advanced intercross lines (AIL), which often exhibit variation in the relationships among individuals.

In my efforts on R/qtl2, a reimplementation R/qtl to better handle high-dimensional data and more complex cross designs, it was clear that I’d need to figure out LMMs. But while papers explaining the fit of LMMs seem quite explicit and clear, I’d never quite turned the corner to actually seeing how I’d implement it. In both reading papers and studying code (e.g., lme4), I’d be going along fine and then get completely lost part-way through.

But I now finally understand LMMs, or at least a particular, simple LMM, and I’ve been able to write an implementation: the R package lmmlite.

It seemed worthwhile to write down some of the details.

Read the rest of this entry »

Session info from R/Travis

25 Sep 2015

For the problem I reported yesterday, in which my R package was working fine locally but failing on Travis, the key solution is to run update.packages(ask=FALSE) locally, and maybe even update.packages(ask=FALSE, type="source") to be sure to grab the source of packages for which binaries are not yet available. I now know to do that.

In addition, it’d be useful to have session information (R and package versions) in the results from Travis. This has proven a bit tricky.

If you don’t want to go with a fully custom Travis script, your customization options are limited. We really only care about the case of a failure, so after_success is not of interest, and after_script seems not to be run if there’s a Travis fail. Moreover, script and after_failure are defined by the main language: r script, so you can’t change them without going all-custom.

What’s left is before_script.

I want to see the result of devtools::session_info() with the package of interest loaded, but the package actually gets built after before_script is run, so we’ll need to build and install it, even though it’ll be built and installed again afterwards. The best I could work out is in this example .travis.yml file, with the key bits being:

before_script:
  - export PKG_NAME=$(Rscript -e 'cat(paste0(devtools::as.package(".")$package))')
  - export PKG_TARBALL=$(Rscript -e 'pkg <- devtools::as.package("."); cat(paste0(pkg$package,"_",pkg$version,".tar.gz"))')
  - R CMD build --no-build-vignettes .
  - R CMD INSTALL ${PKG_TARBALL}
  - rm ${PKG_TARBALL}
  - echo "Session info:"
  - Rscript -e "library(${PKG_NAME});devtools::session_info('${PKG_NAME}')"

I use --no-build-vignettes in R CMD build as otherwise the package would be built and installed yet another time. And I remove the .tar.gz file afterwards, to avoid having the later check complain about the extra file.

Here’s an example of the session info in the Travis log.

If you have suggests about how to simplify this, I’d be happy to hear them. I guess the key would be to have the main Travis script for R revised to report session information.

Thanks to Jenny Bryan for showing me how to search for instances of session_info in .travis.yml files on GitHub, and to Carson Sievert for further moral support.

It’s not you, it’s me

24 Sep 2015

Somehow when my code stops working, my first (and second, and third) reaction is to blame everything except my own code. (“It’s not me, it’s you.”)

And almost always, it’s my own code that’s the problem (hence the title of this post).

I spent the day trying to resolve a bug in my early-in-development R package, qtl2geno. In the process, I blamed

  • TravisCI for not handling system.file() properly.
  • R-devel for having broken system.file().
  • data.table::fread() for treating sep=NULL differently on different operating systems.

Of course, none of these were true. I was just passing sep=NULL to data.table::fread(), and that worked in the previous version, but doesn’t work in the latest release on CRAN, and I hadn’t yet installed the latest version of data.table on my Mac, but Travis and my junky Windows laptop had the latest version.

The debugging process seems a potentially interesting case study, so I thought I’d write down some of the details.

Read the rest of this entry »