Archive for the ‘Academics’ Category
UW-Madison faculty got an email update from Vice Provost and Chief Diversity Officer Patrick Sims regarding the things we can do in response to the hate and bias incidents on campus.
Here are the things he had mentioned yesterday at the Faculty Senate meeting:
- Address hate/bias incidents in your curriculum to ameliorate unacceptable occurrences in our campus community.
- Look at “bullying” language as a way to address possible hate/bias incidents in the classroom.
- Commit to engaging in ongoing cultural competency training. Learning Communities for Institutional Change & Excellence (LCICE) as an infrastructure already provides these services campus-wide.
- Commit to experiencing the leadership institute and become a facilitator, carving out 10-15% of your time towards these efforts.
- Support the request for additional staff.
- Visit the Campus Climate website
An attached letter from the Hate & Bias incident team added:
- Your school/college/department can host a bystander intervention workshop on hate and bias. This workshop will provide tools for UW-Madison community members on when and how to intervene. If you would like to host a workshop, please contact Joshua Moon Johnson.
- Many incidents go unreported for a variety of reasons. We encourage students and campus community members to report incidents of hate and bias to ensure that campus can best support the victim and work to prevent future incidents. We encourage you to post the link to report on your school/college/department websites.
- Oftentimes students do not report incidents because they are unaware of the reporting process. To increase awareness of the reporting process, we encourage you to share brochures and posters with information on how and why it is important to report. These will be distributed across campus in the next few weeks.
- Students who are victims of hate and bias incidents may need immediate support. Please be sure to refer/provide students with appropriate resources such as mental health/counseling services through University Health Services (UHS). The Multicultural Student Center also has drop-in hours with UHS counselors as well as support and discussions groups for students of color.
- Many students who are victims of hate and bias incidents identify with an underrepresented racial group, gender identity or sexual orientation, or religious group. We encourage you to specifically reach out to marginalized student groups to raise awareness of the bystander intervention workshop and reporting process.
I got a reasonably positive response to my email to my faculty colleagues suggesting that we all commit to cultural competency training. But the training from the LCICE mentioned above looks to be semester-long, Tuesdays 4:30-7:30pm. I think I’ll have a difficult time convincing my colleagues of that. We need something in between nothing and 45 hours.
I’m a privileged white male university professor. As privileged as they come, really. My father was a professor of chemistry; my mother also has an advanced degree in chemistry. The jobs I’ve held have been more about personal fulfillment than money: dancer, dance teacher, secretary for intellectual property lawyers, research and teaching assistant, professor. People assume I know what I’m talking about, even if I’m in shorts and a t-shirt.
All that’s just to say that, when it comes to the ongoing hateful acts that have been happening at the University of Wisconsin-Madison, I’m really the last one that you should be listening to. You should instead listen to UW students, such as the United Council of UW Students, who have submitted a list of 5 reasonable demands, or Vice Provost and Chief Diversity Officer Patrick Sims, who made an important 8-min video in response to a recent hateful incident that you should now go away and watch (really, stop reading what I have to say and spend 8 minutes watching that video), or Chris Walker, Asst Prof in the dance department, who spoke movingly today at the UW-Madison Faculty Senate meeting about the shit that faculty and students of color have to put up with on campus.
Lot’s of crap has been happening in Wisconsin lately. My focus has been on what Scott Walker and company have been doing to the state and to the University of Wisconsin, most recently by making huge cuts to state support to the UW System and by weakening tenure and shared governance.
That’s all been an embarrassment, and depressing, but in comparison to the hateful racist shit that’s been happening on campus, and Vice Provost Sims reported that there have been >30 reported hate or bias incidents on campus this year, tenure and funding just don’t seem that important.
Chris Walker’s speech at the Faculty Senate today really hammered this home. As a black man on campus, he’s experienced a lot of shit: worse shit then we’re seeing in the papers. And if we don’t fix this, our students can’t be successful. We must fix this.
What can a biostatistics professor do? I’m open to suggestions.
But for now, I’ll follow Patrick Sims’s suggestion and start with one of the United Council of UW Students’ demands:
We demand that the University of Wisconsin System creates and enforces comprehensive racial awareness and inclusion curriculum and trainings throughout all 26 UW Institution departments, mandatory for all students, faculty, staff, campus & system administration, and regents. This curriculum and training must be vetted, maintained, and overseen by a board comprised of students, staff, and faculty of color.
I’ve written an email to the faculty in my department, asking that we, as a department, volunteer to participate in such racial awareness training:
Correction: There’s an error in my email; Chris Walker is Associate Professor, and has been for a couple of years.
Update: Chris Walker’s speech at the 4 Apr 2016 Faculty Senate meeting was recorded! Must listen.
Reproducibility is hard. It will probably always be hard, because it’s hard keeping things organized.
I recently had a paper accepted at G3, concerning a huge set of sample mix-ups in a large eQTL study. I’d discovered and worked out the issue back in December, 2010. I gave a talk about it at the Mouse Genetics meeting in Washington, DC, in June, 2011. But for reasons that I will leave unexplained, I didn’t write it up until much later. I did the bulk of the writing in October, 2012, but it wasn’t until February, 2014, that I posted a preprint at arXiv, which I then finally submitted to G3 in June this year.
In writing up the paper in late 2012, I re-did my entire analysis from scratch, to make the whole thing more cleanly reproducible. So with the paper now in press, I’ve placed all of that in a GitHub repository, but as it turned out, there was still a lot more to do. (You can tell, from the repository, that this is an old project, because there are a couple of Perl scripts in there. It’s been a long time since I’ve switched from Perl to Python and Ruby. I still can’t commit to just one of Python or Ruby…want to stick with Python, as everyone else is using it, but much prefer Ruby.)
The basic issue is that the raw data is about 1 GB. The clean version of the data is another 1 GB. And then there are results of various intermediate calculations, some are rather slow to calculate, which take up another 100 MB. I can’t reasonably put all of that within the GitHub repository.
Both the raw and clean data have been posted in the Mouse Phenome Database. (Thanks to Petr Simecek, Gary Churchill, Molly Bogue, and Elissa Chesler for that!) But the data are in a form that I thought suitable for others, and not quite in the form that I actually used them.
So, I needed to write a script that would grab the data files from MPD and reorganize them in the way that I’d been using them.
In working on that, I discovered some mistakes in the data posted to MPD: there were a couple of bugs in my code to convert the data from the format I was using into the format I was going to post. (So it was good to spend the time on the script that did the reverse!)
In addition to the raw and clean data on MPD, I posted a zip file with the 110 MB of intermediate results on figshare.
In the end, I’m hoping that one can clone the GitHub repository and just run make and it will download the data and do all of the stuff. If you want to save some time, you could download the zip file from figshare and unzip that, and then run make.
I’m not quite there, but I think I’m close.
Aspects I’m happy with
For the most part, my work on this project wasn’t terrible.
- I wrote an R package, R/lineup, with the main analysis methods.
- That I re-derived the full entire analysis cleanly, in a separate, reproducible document (I used AsciiDoc and knitr) was a good thing.
- The code for the figures and tables are all reasonably clean, and draw from either the original data files or from intermediate calculations produced by the AsciiDoc document.
- I automated everything with GNU Make.
What should I have done differently?
There was a lot more after-the-fact work that I would rather not have to do.
Making a project reproducible is easier if the data aren’t that large and so can be bundled into the GitHub repository with all of the code.
With a larger data set, I guess the thing to do is recognize, from the start, that the data are going to be sitting elsewhere. So then, I think one should organize the data in the form that you expect to be made public, and work from those files.
When you write a script to convert data from one form to another, also write some tests, to make sure that it worked correctly.
And then document, document, document! As with software development, it’s hard to document data or analyses after the fact.
I’m at the 2015 AAAS meeting in San Jose, California. This is definitely not my typical meeting: too big, too broad, and I hardly know anyone here. But here’s a quick (ha ha; yah sure) summary of the meeting so far.
Gerald Fink gave the President’s Address last night. He’s the AAAS President, so I guess that’s appropriate. But after five minutes of really lame simplistic crap (for example, he said something like, “A single picture can destroy our known understanding of the universe,” like innovation and improving our understanding is a bad thing), I left.
Oh, and before that: the emcee of the evening, who introduced Janet Napolitano, totally couldn’t pronounce her last name. (Her remarks, particularly her comments in support of public universities, were quite powerful.) Old dude: practice such things! Your ineptness reveals that you haven’t paid proper attention to her.
A huge meeting, but I know next to no one here. But I ran into Sanjay Shete in the exhibit hall, where I attempted to get two of every tchotchke. (My kids will pitch a fit if one gets something, no matter how lame the thing, and the other doesn’t.) Sanjay was named AAAS Fellow, that’s why he’s here.
I went to a dozen talks. A half-dozen I really liked.
Alan Aspuru-Guzik talked about how to find (and visualize) useful organic molecules among the 1060 (or 10180?) possible. Cool high-throughput computing and interactive graphics to produce better solar panels (particularly for developing countries) and huge batteries to store wind- and solar-based power.
Russ Altman talked about how to search databases, web-search histories, and social media, to identify pairs of drugs that, together, give bad (or good) side effects that wouldn’t be predicted from their on-their-own side effects.
David Altshuler had a hilarious outline slide for his talk, but the rest was really awesome. A key point: to develop precision medicine will require hard work and there’s no magic bullet. And basic (not just translational) research is critical: we can’t make a medicine that gets to the precise cause (and that’s what precision medicine is about) if we don’t understand that basic biology.
I gave a talk myself, in a session on visualization of biomedical data, but it was definitely not the best talk in the session, nor the second best. Mine might have been the worst of the five talks in the session. But that’s okay; I think I did fine. It’s just that Sean Hanlon (brother of my UW–Madison colleague, Bret Hanlon) put together a superb, but thinly-attended, session.
Miriah Meyer’s was my favorite talk of the day. She develops visualization tools to help scientists make sense of their data. And her approach is much like mine: specific solutions to specific data and questions. She talked about MulteeSum, PathLine, and MizBee. Favorite quote: “It’s amazing how much people like circles these days.”
Frederick Streitz from Lawrence Livermore National Lab talked about simulating and visualizing the electrophysiology of the human heart at super-high resolution using a frigging huge cluster, with 1.5 million cores. I loved his analogies: if you are painting your house, having a friend or two over to help will reduce the time by the expected factor, but having 1000 friends or 100k friends to help? In parallel computing, you need to rethink what you’ll use the computers for.
His second analogy: The DOE cluster at Livermore is 100k times a desktop computer. That’s like the difference between PacMan (1980, 2.1 megaFLOPS) to Assassin’s Creed (2011, 260 GigaFLOPS). And their cluster is 100k times that.
At the end of the day, Daphne Koller talked about Coursera. She’s awesome; Coursera’s awesome; I’m a crappy teacher. That’s my thinking at the moment, anyway. (A video of her talk is online. Have I mentioned how much I hate it when people screw up the aspect ratio? It seems like they screwed up the aspect ratio.) University faculty exist to help people, and with Coursera and other MOOCs, we can help a lot of people. Key lessons: the value of peer grading (for learning), not being constrained by the classroom or the 60-min format, ability to explore possible teaching innovations, and just having a hugely broad reach.
College is a place where a professor’s lecture notes go straight to the students’ lecture notes, without passing through the brains of either.
Boy, am I old
I seem to be staying at the same hotel as the American Junior Academy of Sciences (AJAS). Are these high school or college students? Man, do I feel old.
My contribution to education, today: if all of the elevators going down are too packed to accept passengers, press the up button and ride it up and then down. (Later I learned, from one of the AJAS youth, that the “alarm will sound” sign at the bottom of the stairs is a lie. You can take the stairs.)
Moving from Ye Olde Standard Computational Science Practice to a fully reproducible workflow seems a monumental task, but partially reproducible is better than not-at-all reproducible, and it’d be good to give people some advice on how to get started – to encourage them to get started.
It’s a bit rough, and it could really use some examples, but it helped me to get my thoughts together for the Hackathon and hopefully will be useful to people (and something to build upon).
Is any task a more monumental waste of time than writing an introduction and discussion for a dissertation where the chapters are published?
I think many (or most?) of my colleagues would agree with her. The research and the papers are the important things, and theses are hardly read. Why spend time writing chapters that won’t be read?
My response was:
Intro & disc of thesis get the student to think about the broader context of their work.
I’d like to expand on that just a bit.
In the old days, a PhD dissertation was more of a monograph. The new style is to have three or so papers (published or ready-to-submit) as chapters, sandwiched between introductory and discussion chapters. Those intro and discussion chapters are sometimes quite thin. I would prefer them to be more substantial.
The focus on papers is a good thing, as they will be easier to find and more widely read. But a thesis/dissertation is not just a research product, but also a vehicle to get a student to think more deeply and broadly.
The individual papers will include introductory and discussion sections, but journal articles tend to be aimed towards a relatively narrow and specialized audience. More substantive introductory and discussion chapters can help to make the work accessible to a broader audience. They also help to tie the separate papers together: what is the larger scientific context, and how do these pieces of work fit into that?
I don’t want students wasting time on “busy work,” and writing a thesis does seem like busy work. But I think a thesis deserves more than a ten-paragraph introduction. And the value of that introduction is not so much in demonstrating the student’s knowledge, but in being part of the development of that knowledge.
At the Scholarly Publishing Symposium at UW-Madison today. Has interesting list of supplemental materials, but apparently only on paper:
So here they are electronically.
- Know your Copy Rights, Association of Research Libraries
- Optimize Your Publishing, The Right to Research Coalition
- Right to Research, The Right to Research Coalition
- About SPARC (Scholarly Publishing and Academic Research Coalition)
- Open Data Factsheet, SPARC
- Open Education Factsheet, SPARC
- Open Access to Scholarly and Scientific Research Articles, SPARC
- SPARC Author Addendum
Terry Speed recently gave a talk on the role of statisticians in “Big Data” initiatives (see the video or just look at the slides). He points to the history of statisticians’ discussions of massive data sets (e.g., the Proceedings of a 1998 NRC workshop on Massive data sets) and how this history is being ignored in the current Big Data hype, and that statisticians, generally, are being ignored.
I was thinking of writing a polemic on the need for reform of academic statistics and biostatistics, but in reading back over Simply Statistics posts, I’ve decided that Rafael Irizarry and Jeff Leek have already said what I wanted to say, and so I think I’ll just summarize their points.
- We need to engage in real present-day problems
- Computing should be a big part of our PhD curriculum
- We need to deliver solutions
- We need to improve our communication skills
Jeff said, “Data science only poses a threat to (bio)statistics if we don’t adapt,” and made the following series of proposals:
- Remove some theoretical requirements and add computing requirements to statistics curricula.
- Focus on statistical writing, presentation, and communication as a main part of the curriculum.
- Focus on positive interactions with collaborators (being a scientist) rather than immediately going to the referee attitude.
- Add a unit on translating scientific problems to statistical problems.
- Add a unit on data munging and getting data from databases.
- Integrating real and live data analyses into our curricula.
- Make all our students create an R package (a data product) before they graduate.
- Most important of all have a “big tent” attitude about what constitutes statistics.
I agree strongly with what they’ve written. To make it happen, we ultimately need to reform our values.
Currently, we (as a field) appear satisfied with
- Papers that report new methods with no usable software
- Applications that focus on toy problems
- Talks that skip the details of the scientific context of a problem
- Data visualizations that are both ugly and ineffective
Further, we tend to get more excited about the fanciness of a method than its usefulness.
We should value
- Usefulness above fanciness
- Tool building (e.g., usable software)
- Data visualization
- In-depth knowledge of the scientific context of a problem
In evaluating (bio)statistics faculty, we should consider not just the number of JASA or Biometrics papers they’ve published, but also whether they’ve made themselves useful, and to the scientific community and well as to other statisticians.