Posts Tagged ‘teaching’

Cheat sheets for R-based Software Carpentry course

29 Apr 2015

At the Software Carpentry workshop at UW-Madison in August, 2014, one of the students suggested that we hand out some cheat sheets on each topic. I thought that was a really good idea.

So at the SWC workshop at Washington State University this week, we handed out the following five pages:

I really appreciate the work (and design sense) that were put into these.

Reform academic statistics

1 May 2014

Terry Speed recently gave a talk on the role of statisticians in “Big Data” initiatives (see the video or just look at the slides). He points to the history of statisticians’ discussions of massive data sets (e.g., the Proceedings of a 1998 NRC workshop on Massive data sets) and how this history is being ignored in the current Big Data hype, and that statisticians, generally, are being ignored.

I was thinking of writing a polemic on the need for reform of academic statistics and biostatistics, but in reading back over Simply Statistics posts, I’ve decided that Rafael Irizarry and Jeff Leek have already said what I wanted to say, and so I think I’ll just summarize their points.

Following the RSS Future of the Statistical Sciences Workshop, Rafael was quite optimistic about the prospects for academic statistics, as he noted considerable consensus on the following points:

  • We need to engage in real present-day problems
  • Computing should be a big part of our PhD curriculum
  • We need to deliver solutions
  • We need to improve our communication skills

Jeff said, “Data science only poses a threat to (bio)statistics if we don’t adapt,” and made the following series of proposals:

  • Remove some theoretical requirements and add computing requirements to statistics curricula.
  • Focus on statistical writing, presentation, and communication as a main part of the curriculum.
  • Focus on positive interactions with collaborators (being a scientist) rather than immediately going to the referee attitude.
  • Add a unit on translating scientific problems to statistical problems.
  • Add a unit on data munging and getting data from databases.
  • Integrating real and live data analyses into our curricula.
  • Make all our students create an R package (a data product) before they graduate.
  • Most important of all have a “big tent” attitude about what constitutes statistics.

I agree strongly with what they’ve written. To make it happen, we ultimately need to reform our values.

Currently, we (as a field) appear satisfied with

  • Papers that report new methods with no usable software
  • Applications that focus on toy problems
  • Talks that skip the details of the scientific context of a problem
  • Data visualizations that are both ugly and ineffective

Further, we tend to get more excited about the fanciness of a method than its usefulness.

We should value

  • Usefulness above fanciness
  • Tool building (e.g., usable software)
  • Data visualization
  • In-depth knowledge of the scientific context of a problem

In evaluating (bio)statistics faculty, we should consider not just the number of JASA or Biometrics papers they’ve published, but also whether they’ve made themselves useful, and to the scientific community and well as to other statisticians.

Startling lack of training in statistical computing

14 Mar 2014

It is shocking to me that a statistics department would offer a graduate-level statistical computing course only every fourth year.

I had been arguing for a statistical programming course: that we supplement the usual course on the theory of statistical computing (numerical linear algebra, EM algorithm, MCMC, etc.) with a course on the practice of statistical computing.

But I was assuming that the more theoretical statistical computing course was actually being taught.

If a department teaches a course at a frequency less than every-other-year, it’s unavailable to many students, or it comes too late in their training to be useful. And statistical computing should really be considered part of a statistics department’s core curriculum.

Update: As you might have anticipated, I’ve been asked to teach the course.

Copyright of video lectures

9 Feb 2014

Quite a while back, I was wondering about copyright of video lectures produced by university faculty. In particular, did I need to get the university to sign some sort of waiver in order for the Jackson Laboratory to post a video of a lecture I’d given for a course there? (The Jackson Lab lawyers wanted that.)

I spoke to my family librarian, who pointed me to Carrie Kruse, directory of the College Library at UW-Madison. And really, librarians are the people to talk to about this sort of thing, as they not only think a lot about copyright and fair use, but also they’re on the side of more access. (But don’t hold Carrie accountable if I say anything wrong below; this is my own interpretation, 14 months after corresponding with her.)

In most jobs, the product of your work is owned by the company, even if they didn’t have anything to do with it. But universities have a different tradition. Typically, the university doesn’t assert any rights over a faculty member’s instructional materials. For example, if you write a textbook, you don’t have to negotiate with the university over its publication, nor do you have to give the university a cut of the royalty income. That’s different than patents.

UW-Madison has an explicit policy about faculty instructional materials. (Really, it’s a UW System policy.) As I understand it, the university will assert some rights over your instructional materials only if they had contributed special resources or support to their creation (for example, if university staff assisted you with the recording and editing of a video).

Returning to the video issue: since the university wasn’t involved in the production of the video, I didn’t have to get their okay.

Carrie mentioned another important thing to pay attention to: if I had used any photos or other media to which I don’t have rights, I need to be careful about their inclusion in videos posted online. Within a classroom, or in a video posted online but only made accessible to a defined group of students, inclusion of such material could fall under fair use. But if such material is included in a video that is posted online for general viewing, others may question my fair use claim.

That may explain why so many instructors here are using password-protected sites, like Learn@UW and Moodle. I can’t even look at my colleagues’ course material.

I dislike web pages via online forms. (Well, except for you, wordpress; I wish I’d started this blog with GitHub pages, but you’re okay.)

And I despise the password protection of instructional materials. If I spend a bunch of time preparing material, I want to distribute it as widely as possible. If another instructor uses it in their own class, I consider that a Good Thing.

It was a really bad idea to use slides in that class

8 Oct 2013

I gave a presentation in the Statistical Consulting course at UW–Madison today. I’ve done so a number of times in the past 6 years. Until today, I’d just spoken informally from a few pages of notes. (Earlier this year, I wrote up those notes as a blog post.)

This year, just 45 min before the class, I thought I’d quickly create some slides to present. I thought it’d be an interesting “experiment” (not in the formal sense):

The outcome was pretty clear: It was easy to create a bunch of bullet-point-based slides. They look nice. (See the pdf here; source here.)

But, the slides themselves worse than useless: Unnecessary, and they interfered with the desired informal nature of the discussion.

I won’t be using those slides again. I’ll go back to just talking from notes.

Fortunately, the students were really good and involved and asked great questions, anyway. So no real harm done.

Tutorials on git/github and GNU make

10 May 2013

If you’re not using version control, you should be. Learn git.

If you’re not on github, you should be. That’s real open source.

To help some colleagues get started with git and github, I wrote a minimal tutorial. There are lots of git and github resources available, but I thought I’d give just the bare minimum to get started; after using git and github for a while, other resources make a lot more sense and seem much more worthwhile.

And for R folks, note that it’s easy to install R packages that are hosted on github, using Hadley Wickham‘s devtools package. For example, to install Nacho Caballero‘s clickme package:

install_github("clickme", "nachocab")

Having written that git/github tutorial, I thought: I should write more such!

So I immediately wrote a similar short tutorial on GNU make, which I think is the most important tool for reproducible research.

UW-Madison joins Coursera

21 Feb 2013

UW-Madison is joining Coursera to offer four massive open online courses (MOOCs).

I like Sara Goldrick-Rab’s comments.

MOOCs are a good thing, but I view them as “outreach”. There are lots of problems with universities, but MOOCs don’t seem to be a solution to any of them.

I like Sara’s point about MOOCs more being used more for continuing education for the already educated vs new access for the uneducated.

A course in statistical programming

25 May 2012

Graduate students in statistics often take (or at least have the opportunity to take) a statistical computing course, but often such courses are focused on methods (like numerical linear algebra, the EM algorithm, and MCMC) and not on actual coding.

For example, here’s a course in “advanced statistical computing” that I taught at Johns Hopkins back in 2001.

Many (perhaps most) good programmers learned to code outside of formal courses. But many statisticians are terrible programmers and would benefit by a formal course.

Moreover, applied statisticians spend the vast majority of their time interacting with a computer and would likely benefit from more formal presentations of how to do it well. And I think this sort of training is particularly important for ensuring that research is reproducible.

One really learns to code in private, struggling over problems, but I benefited enormously from a statistical computing course I took from Phil Spector at Berkeley.

Brian Caffo, Ingo Ruczinski, Roger Peng, Rafael Irizarry, and I developed a statistical programming course at Hopkins that (I think) really did the job.

I would like to develop a similar such course at Wisconsin: on statistical programming, in the most general sense.

I have in mind several basic principles:

  • be self-sufficient
  • get the right answer
  • document what you did (so that you will understand what you did 6 months later)
  • if primary data change, be able to re-run the analysis without a lot of work
  • are your simulation results reproducible?
  • reuse of code (others’ and your own) rather than starting from scratch every time
  • make methods accessible to (and used by) others

Here are my current thoughts about the topics to include in such a course. The key aim would be to make students aware of the basic principles and issues: to give them a good base from which to learn on their own. Homework would include interesting and realistic programming assignments plus create a Sweave-type document and an R package.

  • Basic unix tools (find; df; top; ps ux; grep); unix on Mac and windows
  • Emacs/vim/other editors (rstudio/eclipse)
  • Latex (for papers; for presentations)
  • slides for talks; posters; figures/tables
  • Advanced R (fancy data structures; functions; object-oriented stuff)
  • Advanced R graphics
  • R packages
  • Sweave/asciidoc/knitr
  • minimal Perl (or Python or Ruby); example of data manipulation
  • Minimal C (or C++); examples of speed-up
  • version control (eg git or mercurial); backups
  • reproducible research ideas
  • data management
  • managing projects: data, analyses, results, papers
  • programming style (readable, modular); general but not too general
  • debugging/profiling/testing
  • high-throughput computing; parallel computing; managing big jobs
  • finding answers to questions: man pages; documentation; web
  • more on visualization; dynamic graphics
  • making a web page; html & css; simple cgi-type web forms?
  • writing and managing email
  • managing references to journal articles

Sports statistics

7 Nov 2011

There was an article in the New York Times on Sunday about teaching statistics through sports examples.

I personally would avoid sports entirely, as I view the subject to be insufficiently serious. Maybe that’s an indication of my being a terrible instructor of introductory statistics: I don’t care that much what the students are interested in.

Certainly lots of statisticians are interested in sports. David Brillinger told me he’d learned a lot of statistics from studying sports. And I’m not completely uninterested in sports: I like to watch football, particularly Nebraska, Green Bay, and Baltimore, and to see Notre Dame or any team from Florida or Texas lose.

But statistics about sports? Yawn.